Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: v7.5.0 scale-in pd不彻底,导致两个群集数据混乱

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] After successfully scaling in a PD node, the cluster still reconnects to this PD node.
[Encountered Problem: Problem Phenomenon and Impact] If this offline node is scaled out to a new PD cluster, the two clusters will merge, causing data confusion.
10.25.248.131:2380 (VMS584328) previously belonged to the tikv-oversea cluster. On 2024/04/08 10:19:26, 10.25.248.131:2380 was scaled in, and tiup cluster display tikv-oversea already showed that 10.25.248.131:2380 was removed. Subsequently, the server VMS584328 was taken offline. However, the pd.log shows that tikv-oversea is still connecting to 10.25.248.131:2380 and reporting connection errors, which continued until 2024/04/10.
On 2024/04/10, a new server VMS602679 was launched, coincidentally reusing the IP 10.25.248.131. On 2024/04/10 13:47, 10.25.248.131:2380 (VMS602679) was scaled out to the tikv-dal-test cluster, making the tikv-dal-test cluster a 3+1 mode. At this time, the 6 nodes of tikv-oversea also reconnected to 10.25.248.131:2380, forming a 6+1 mode. Subsequently, the 3+1+6, 10 PD nodes were all connected, forming a 10-node PD cluster, causing data confusion.
tikv-oversea
10.109.220.10:2379
10.109.220.9:2379
10.25.248.208:2379
10.25.248.246:2379
10.58.228.76:2379
10.58.228.86:2379
tikv-dal-test
10.58.228.37
10.109.216.124
10.25.248.212
tikv-oversea pd log:
[2024/04/07 18:37:25.977 +08:00] [INFO] [etcdutil.go:309] [“update endpoints”] [num-change=7->8] [last-endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.9:2379,http://10.109.220.10:2379,http://10.25.248.246:2379,http://10.25.248.131:2379,http://10.25.249.164:2379]”] [endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.25.248.246:2379,http://10.109.220.9:2379,http://10.25.248.131:2379,http://10.25.249.164:2379,http://10.25.248.208:2379]”]
[2024/04/08 10:19:26.254 +08:00] [INFO] [cluster.go:422] [“removed member”] [cluster-id=468758231b5b0393] [local-member-id=edff54aa33575887] [removed-remote-peer-id=f67c161a4e9b9cb8] [removed-remote-peer-urls=“[http://10.25.248.131:2380]”]
[2024/04/08 10:19:27.958 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
[2024/04/08 10:19:27.958 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
…
[2024/04/09 14:46:33.395 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection timed out". Reconnecting…”]
[2024/04/09 14:49:25.265 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: i/o timeout". Reconnecting…”]
…
[2024/04/10 13:44:05.323 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
[2024/04/10 13:45:57.545 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
[2024/04/10 13:46:21.890 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
[2024/04/10 13:47:58.088 +08:00] [INFO] [etcdutil.go:309] [“update endpoints”] [num-change=6->7] [last-endpoints=“[http://10.25.248.246:2379,http://10.58.228.76:2379,http://10.25.248.208:2379,http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.109.220.9:2379]”] [endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.109.220.9:2379,http://10.25.248.208:2379,http://10.25.248.246:2379,http://10.25.248.131:2379]”]
[2024/04/10 13:48:08.085 +08:00] [INFO] [etcdutil.go:309] [“update endpoints”] [num-change=6->7] [last-endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.9:2379,http://10.109.220.10:2379,http://10.25.248.246:2379,http://10.25.248.208:2379]”] [endpoints=“[http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.25.248.208:2379,http://10.58.228.76:2379,http://10.109.220.9:2379,http://10.25.248.246:2379,http://10.25.248.131:2379]”]
[2024/04/10 13:48:18.090 +08:00] [INFO] [etcdutil.go:309] [“update endpoints”] [num-change=7->10] [last-endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.109.220.9:2379,http://10.25.248.208:2379,http://10.25.248.246:2379,http://10.25.248.131:2379]”] [endpoints=“[http://10.109.220.10:2379,http://10.58.228.76:2379,http://10.109.220.9:2379,http://10.58.228.86:2379,http://10.109.216.124:2379,http://10.25.248.212:2379,http://10.25.248.208:2379,http://10.58.228.37:2379,http://10.25.248.246:2379,http://10.25.248.131:2379]”]