Failed to scale out using scale-out, indicating port conflict

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用scale-out扩容失败,提示端口冲突

| username: qufudcj

[TiDB Usage Environment] Test
[TiDB Version] 4.0.4
[Reproduction Path] What operations were performed to cause the issue
The tiup cluster scale-out command prompts completely unrelated port conflicts.
[Encountered Problem: Problem Phenomenon and Impact]
Currently, the drainer is facing a memory bottleneck, so I plan to migrate the machine. First, I used tiup cluster stop pre-tidb-cluster -N to stop the only existing drainer in the current cluster. I plan to scale out a new one and then use scale-in to remove the underperforming drainer.

Then I wrote the scale-out.yaml:

drainer_servers:
  - host: 172.16.12.165
    # user: "tidb"
    port: 8249
    ssh_port: 40022
    deploy_dir: "/alidata/tidb/deploy"
    data_dir: "data"
    config:
      syncer.db-type: "kafka"
      syncer.to.kafka-addrs: "***:9092,****:9092,****:9092"
      syncer.to.kafka-version: "1.0.2"
      syncer.to.kafka-max-messages: 1536
      syncer.to.kafka-max-message-size: 1610612736
      syncer.to.topic-name: "tidb-binlog-pre"

Using the command tiup cluster scale-out pre-tidb-cluster scale-out.yaml, it reported a port conflict, but the IP and port in the error message are completely unrelated to the drainer I want to scale out, and the cluster name is also different. There are no ports listening on the new drainer host.

This cluster was not created by me. By checking the cluster list, I found two clusters.


Checking both clusters, I found they are identical, which is confusing.
pre-test:

pre-tidb-cluster:

Now I don’t know what to do. I’m afraid of overwriting something, and I don’t understand why the two cluster names return the same information.

| username: tidb菜鸟一只 | Original post link

There are two folders in the /home/tidb/.tiup/storage/cluster/clusters directory, and apart from the names being different, everything else is the same?

| username: qufudcj | Original post link

Yes, I want to operate the pre-tidb-cluster.

| username: qufudcj | Original post link

The information displayed by the two clusters is exactly the same, including address, port, and status.

| username: tidb菜鸟一只 | Original post link

I think we can delete one and keep the other, but we need to confirm that these two are exactly the same.

| username: qufudcj | Original post link

I’m not too confident, I’m afraid of deleting the existing cluster.

| username: qufudcj | Original post link

I am now confused as to why the error message indicates a port host that is completely unrelated to the new machine I want to scale out. Is there any configuration I can modify to ignore this error?

| username: wzf0072 | Original post link

12.73 2379/2380 are for PD, and both clusters have 3 PD nodes. After you scale out one PD node in the test cluster, scale in the 12.73 node. The 18.27 node is TiDB’s Binlog tool Drainer, and the data directory is different. It is currently in a down state, so consider whether to delete or rebuild it based on the business situation.

| username: qufudcj | Original post link

Sorry, I may not have made it clear. All the TiDB components in these two clusters are exactly the same, which means that the PD at 12.73 2379/2380 is registered in both clusters.

If I scale down this PD from the pre-test cluster, it will be physically deleted, and the pre-tidb-cluster will inevitably lose this node as well.

I feel that even if this PD is migrated away, it will continue to report errors about other components’ ports being occupied. Could it be that something is causing tiup cluster to misjudge?

| username: qufudcj | Original post link

All the nodes in their clusters are exactly the same.

| username: xingzhenxiang | Original post link

Did you check the configuration file before scaling up?

| username: qufudcj | Original post link

Are you referring to the files used to set up the cluster initially? Since I didn’t set up the cluster, I can’t find them anymore. Is there any way to locate and modify them?

| username: xingzhenxiang | Original post link

It’s better to ask the official team. I don’t have experience with this either.

| username: tidb菜鸟一只 | Original post link

I suggest confirming that the two folders I mentioned are the same, then remove one of them. After that, use the remaining cluster name to specify scaling up or down and see if there are still any issues.

| username: qufudcj | Original post link

So you mean moving it on the machine using mv, right? I’ll try it this afternoon, thanks.

| username: qufudcj | Original post link

Thank you. I can move the folder of a cluster with mv, but it’s quite strange that now neither of the two drainers are writing data to Kafka. There are no errors in the logs, just lines of “write save point”.

| username: qufudcj | Original post link

Occasionally, a line is printed:
[INFO] [client.go:716] [“[sarama] Client background metadata update: kafka: no specific topics to update metadata”]

| username: 小王同学Plus | Original post link

This is usually one of the normal log records when the Kafka cluster is running normally and does not require special handling. This error message indicates that the Kafka cluster has not received any requests to update metadata, so no action was taken.

| username: srstack | Original post link

When performing the check, it will not only compare conflicts on the new machine but also check the existing cluster. I suspect this situation is caused by a previous backup of the meta.yaml file.