Issues with Leader Migration in Placement SQL

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: placement sql迁移leader的问题

| username: TiDBer_Q6zIfbhF

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]

The online cluster has 3 locations and 5 replicas, with 3 replicas in location A, and 1 replica each in locations B and C, with all leaders in location A.
Now, we are setting up placement SQL migration to change the cluster to:
0 replicas in location A, 3 replicas in location B, and 2 replicas in location C.

As shown in the monitoring, most of the data was migrated within 1 minute, but the remaining small portion of data took more than 10 hours to migrate.
What could be the reason for this? I understand that the original replica in location B directly became the leader, so the speed was fast, but some data took more than 10 hours. I am not sure about the reason for this. Can any experts provide an explanation?

| username: zhanggame1 | Original post link

Your understanding should be correct. The original replica in location B directly becomes the leader, so the speed is very fast. However, if some of the original replicas are missing, it will be very slow.

| username: TiDBer_Q6zIfbhF | Original post link

Site B has 1 replica, so there should be a copy of all the data, right?

| username: redgame | Original post link

Those data that migrate quickly may be relatively small, while those data that migrate slowly may be relatively large.

| username: tidb菜鸟一只 | Original post link

I am not sure if your three-site no-replica setup is a three-site three-center or a same-city three-center. Could you provide the placement SQL you originally deployed? If the settings are unreasonable, for an online cluster with 3 sites and 5 replicas, with 3 replicas in site A and 1 replica each in sites B and C, and all leaders in site A, in this case, when you operate on the data, only 3 nodes need to complete the consistency change for the operation to succeed. This means that theoretically, it is possible for all 3 replicas in site A to successfully change, completing the operation. At this point, if you switch the leader from site A to the replica in site B, the replica in site B will need to synchronize data to become the leader because its data is inconsistent with the leader.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.