Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: PD节点下线和缩容有什么区别
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v5.4.0
[Reproduction Path] Could you please explain the difference between decommissioning and scaling down a PD node?
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]
Once taken offline, it can be brought back online, and the data will still be there.
If it’s scaled down, the data will be handed over to other nodes, and this node will be gone.
However, the timeline needs to be considered. If it exceeds the cycle, the data on the offline node will be too outdated and will be eliminated, forcing a resynchronization.
Scaling down means it’s gone, decommissioning means it’s shut down.
If the downtime is long and the data is outdated, it will also resynchronize.
Offline equals stop, scaling down equals deletion.
Once scaled down, it will be gone and needs to be scaled back up again.
Offline equals stop, scaling down equals deletion.
If conditions permit, scaling down is the safest way to handle it.
After deletion, the monitoring and configuration are all gone.
Cannot see monitoring after going offline.
After going offline, the data is still there. After scaling down, the data is gone, and you need to scale up to retrieve it.
Offline can be used for host maintenance.
It can be understood like this:
Offline equals vacation
Scale down equals resignation
Vivid and figurative analogy, the expert’s description is very apt.
However, it seems that the original poster’s question is not about offline, but about deleting a PD member. Is this the same effect as tiup cluster stop
? The official documentation does not clearly describe this either.
Well, essentially it’s about having one less serviceable PD node, whether it’s offline or scaled down. The difference between the two lies in whether subsequent operations can make the node recoverable again.
What I mean is, does “member delete” refer to
removing this PD node from the member list, but the PD node is still running and just temporarily not participating in the election,
or
equivalent to executing tiup cluster stop tidb-test -N xxx.xxx.xxx.xxx:2379
?
The reason for this question is because I saw the following statement. This statement makes me feel that the deleted PD node service is still running, just temporarily removed from the member list. If the node service is not running, there would be no possibility of scheduling.
If the node storage cannot be automatically migrated (for example, using local storage), you need to delete the PD Member to achieve rescheduling.