Scale-In Shrinking Process

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: scale-in 缩容过程

| username: 胡杨树旁

When downsizing a cluster, the server and disk space are sufficient to accommodate the data volume after downsizing. Currently, there are 11 TiKV servers being downsized to 8 servers, with 3 machines being removed. I would like to ask, during the process of downsizing with 3 replicas, how is the leader migrated from the original region to other peers? Does the original leader lose its heartbeat, and the other two followers conduct an election to select a new leader? If that’s the case, there are currently 2 replicas, which do not meet the requirement of 3 replicas. Does the leader immediately go to replenish the replicas after the migration?

| username: TiDBer_CQ | Original post link

  1. Leader Migration:
  • When a TiKV node is scaled down, the leaders on it may need to migrate to other nodes.
  • The process of leader migration is usually as follows:
    1. Original Leader Loses Heartbeat: When a TiKV node is scaled down, the original leader may lose its heartbeat, causing it to no longer be considered a valid leader.
    2. Other Followers Conduct Election: The other two followers will start an election to select a new leader.
    3. Electing a New Leader: During the election process, TiKV nodes communicate with each other and use the Raft algorithm to elect a new leader.
    4. New Leader Takes Over: Once a new leader is elected, it will take over the responsibilities of the original leader, including handling client requests and replicating data.
  1. Replica Supplement:
  • During the scaling down process, if the original leader is migrated away, it may result in an insufficient number of replicas for certain regions.
  • The TiKV cluster will automatically detect this situation and attempt to supplement the replicas.
  • The process of supplementing replicas is usually as follows:
    1. Detecting Insufficient Replicas: The TiKV cluster will detect regions with an insufficient number of replicas.
    2. Selecting a Suitable Node: The cluster will select a suitable node to act as the new replica.
    3. Replicating Data: The new replica will copy data from other replicas to ensure data consistency.
    4. Maintaining Raft Logs: The new replica will maintain Raft logs to ensure data durability and consistency.
| username: 胡杨树旁 | Original post link

During the scaling down process, I checked the monitoring and found that the time of leader migration and the time of a large number of empty regions being generated did not match. The leader migration time was approximately earlier than the time of a large number of empty regions being generated. Can it be understood this way: the migration of leaders and the replenishment of replicas do not happen simultaneously. First, the leader migration is completed, then detection is performed, and if there are missing replicas, replenishment is carried out. Another question is, I noticed that some people manually evict leaders from TiKV nodes before scaling down. What is the difference between manually evicting leaders and waiting for the cluster to migrate leaders on its own?

| username: 江湖故人 | Original post link

Manual eviction is to make the scaling down time more controllable, right?

| username: tidb菜鸟一只 | Original post link

When scaling down TiKV, the leader nodes on the current nodes will be migrated first, and the replica replenishment will be done later. Manual eviction is to remove the leader first, so that when executing the command, you don’t have to wait too long and risk exceeding the default timeout.

| username: dba远航 | Original post link

The leader node on the current node is migrated to another node.

| username: redgame | Original post link

The Leader replica will be migrated.

| username: TiDBer_aaO4sU46 | Original post link

Add a replica.

| username: residentevil | Original post link

Do we need to observe the status of the nodes being scaled down? How many types of statuses are there in total?

| username: 舞动梦灵 | Original post link

Brother, do you have any documentation or commands related to manually evicting the leader? I searched the community and official documentation but couldn’t find anything.

| username: tidb菜鸟一只 | Original post link

Evict all leaders from the TiKV nodes to other TiKV nodes using the pd-ctl command:
scheduler add evict-leader-scheduler 1: Add a scheduler to remove all leaders from Store 1.

| username: tidb菜鸟一只 | Original post link

By using pd-ctl, you can view the status information of TiKV Store. The specific statuses of TiKV Store are Up, Disconnect, Offline, Down, and Tombstone. The relationships between these statuses are as follows:

  • Up: Indicates that the current TiKV Store is in a service-providing state.
  • Disconnect: When the heartbeat information between PD and TiKV Store is lost for more than 20 seconds, the Store’s status will change to Disconnect. If the time exceeds the duration specified by max-store-down-time, the Store will change to Down status.
  • Down: Indicates that the TiKV Store has been disconnected from the cluster for a time exceeding the duration specified by max-store-down-time, which defaults to 30 minutes. After this time, the corresponding Store will become Down, and the cluster will start replenishing the replicas of each Region on the surviving Stores.
  • Offline: When a TiKV Store is manually taken offline through PD Control, the Store will change to Offline status. This status is only an intermediate state of the Store being taken offline. The Store in this state will move all its Regions to other Up status Stores that meet the relocation conditions. When the Store’s leader_count and region_count (obtained in PD Control) both show 0, the Store will change from Offline status to Tombstone status. In the Offline state, it is prohibited to shut down the Store service and its physical server. During the offline process, if there are no other target Stores in the cluster that meet the relocation conditions (e.g., there are not enough Stores to continue meeting the cluster’s replica count requirements), the Store will remain in the Offline state.
  • Tombstone: Indicates that the TiKV Store is completely offline and can be safely cleaned up using the remove-tombstone interface.
| username: WinterLiu | Original post link

The first poster explained it in great detail, I’m impressed.

| username: 烂番薯0 | Original post link

Isn’t this thing automatic?

| username: 胡杨树旁 | Original post link

I have another question. If I manually stop a TiKV instance, will the leaders on this instance be migrated? Or will the leaders on this instance die, causing some data to be unreadable?

| username: 胡杨树旁 | Original post link

In other words, manually evicting the leader is to shorten the overall time of the scaling-down process. In fact, the entire process is the same as the automatic waiting for scaling down.

| username: tidb菜鸟一只 | Original post link

Yes, it will migrate. Whether it is a normal shutdown or an abnormal shutdown, as long as PD cannot connect to your TiKV node for a certain period of time, it will assume that your TiKV node is unreachable and start enabling followers on other nodes to serve as leaders to provide services externally.

| username: tidb菜鸟一只 | Original post link

The same, and when you manually evict, you can adjust the migration speed of the region and leader to shorten the time.

| username: 胡杨树旁 | Original post link

Thank you, I’ve learned something new.

| username: residentevil | Original post link

Very good