How to Stop Data Migration of Expanding TiKV Nodes

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何停止正在扩容的TiKV节点的数据迁移

| username: 搬砖er

[TiDB Usage Environment] Production Environment
[TiDB Version] 4.0.2
[Encountered Problem: Phenomenon and Impact]
Background: Expanding the online TiKV cluster by adding nodes to improve performance. The original cluster had 15 nodes, and now 3 nodes have been added. During the migration, it was found that the new nodes had insufficient disk capacity. We want to stop the expansion and maintain the current total of 18 nodes, but we don’t want the existing data to migrate to the 3 new nodes.
Currently, store limit all 1 and config set region-schedule-limit 4 have been set, which slowed down the migration speed. I want to know if there is a way to completely stop the existing data from migrating to the new nodes (3 nodes).

| username: TiDBer_jYQINSnf | Original post link

scheduler remove balance-region-scheduler
scheduler remove balance-hot-region-scheduler
These two commands immediately stop all region migration actions, but stopping them for a long time is not advisable.
store weight xxx 1 0.x
This command adjusts the region weight. xxx represents the new store ID, followed by two numbers: 1 represents the leader weight, and 0.x represents the region weight. By adjusting the region weight to an appropriate value, regions will no longer migrate to it.

Additionally, if the disk space of your node is reported correctly, regions will not be forcibly migrated to it unless you are running a mixed deployment on multiple nodes. In that case, you would need to use the above methods to stop the migration.

| username: 数据库真NB | Original post link

Can’t we take those three newly added nodes offline one by one?

| username: 搬砖er | Original post link

There is no mixed deployment, and the disk space information is reported correctly.

| username: 搬砖er | Original post link

The current cluster water level:
image

The expanded node, the one in the red box:

Normally, one node in the cluster has about 118k regions, occupying 2.7T of disk space. The expanded node has only migrated 75k regions but is already occupying 2.5T of disk space. This capacity doesn’t seem quite proportional. I’m not sure if there will be any issues with continuing to expand or shrinking back.

| username: 数据库真NB | Original post link

You can find the one with the least disk usage among the three newly added nodes and stop the TiDB service to try. Additionally, prepare to purchase sufficiently large disks. Is it possible to dynamically expand the disk capacity of one node while adding a new node and then merge it into the cluster?

| username: tidb菜鸟一只 | Original post link

Actually, continuing the migration is not a problem. Later, when your disk space is insufficient, it will trigger low-space-ratio, and regions will no longer be migrated to the cluster.

| username: 搬砖er | Original post link

It seems there is a capacity issue during data migration. The node with about 70% of the region data migrated has a capacity similar to the normal nodes. I’m concerned that the disk on the new node will be full if the migration continues.

| username: tidb菜鸟一只 | Original post link

You can check the configuration of this parameter, which is generally 80%. When this usage rate is reached, regions will no longer be migrated to the node.

| username: TiDBer_jYQINSnf | Original post link

No worries, pure migration won’t cause an overload.
The disk space is full because the data from other nodes has been compressed to level 6. The newly migrated data has to move down layer by layer from level 0, so the disk space usage will be a bit higher. It will gradually come back.
The compression rate of RocksDB increases as it moves down.

| username: Kongdom | Original post link

First of all, the expansion should have already been completed. Migration is a data balancing operation after expansion and has nothing to do with the expansion itself. Then, if the available disk capacity is insufficient, the cluster will automatically stop migrating data to the node. Personally, I think this is the case.

| username: Soysauce520 | Original post link

Adjust the weight according to the store ID in PD.

| username: 小毛毛虫 | Original post link

If it is a production environment, for long-term operational stability, it is better to ensure that the disks of each TiKV instance are consistent. For example, in situations like replacing machines or switching instances, inconsistent disks may easily cause problems.

| username: lemonade010 | Original post link

Got it, thanks for sharing.