Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 如何换掉集群中的所有pd节点
Version: TiDB 4.0.13
Requirement: Replace all existing PD nodes in the cluster
For example, the current PD node information of the cluster is as follows:
192.168.1.1 (leader)
192.168.1.2
192.168.1.3
Need to replace with:
192.168.1.4
192.168.1.5
192.168.1.6
In the actual operation process, I first added two PD nodes, then removed an old one (non-leader node), then performed a PD leader switch to the newly added node, then removed another old PD, added one more PD, and finally removed the last old one.
The specific operations are as follows:
- Add two PDs (192.168.1.4, 192.168.1.5)
- Remove 192.168.1.2
- Switch PD leader from 192.168.1.1 to 192.168.1.4 (member leader transfer)
- Remove 192.168.1.3
- Add one PD (192.168.1.6)
- Remove 192.168.1.1
This operation process did not encounter any issues, and the business writes were normal, but some nodes appeared in a Disconnected state briefly, and the following two phenomena occurred:
-
The pump component reported Heartbeat-related errors, with the error message as follows:
This error was resolved by restarting. -
Executing
tiup cluster display
for the entire cluster was particularly slow, caused by the pump error, which was also resolved by restarting the pump.
So I would like to consult on how to operate for such a requirement.