Why are the remaining regions not fully migrated even after many days following a scale-in of a TiKV node?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 对tikv一节点scale-in, 剩余最后几十个region很多天后region都没有迁移完,请问是什么原因?

| username: DBRE

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.1.1
[Encountered Problem: Phenomenon and Impact]
When scaling in one TiKV node, the remaining regions have not migrated completely even after many days. What could be the reason? How should this be handled?

[Attachment: Screenshot/Log/Monitoring]
The PD leader node outputs the following log every 10 minutes:
[2023/12/14 09:54:07.883 +08:00] [WARN] [cluster.go:1232] [“store may not turn into Tombstone, there are no extra up store has enough space to accommodate the extra replica”] [store="id:3363993 address:"10.10.10.10:29120" state:Offline labels:<key:"host" value:"tikv8" > version:"5.1.1" status_address:"10.10.10.10:29130" git_hash:"4705d7c6e9c42d129d3309e05911ec6b08a25a38" start_timestamp:1702458741 deploy_path:"/work/tidb29100/deploy/tikv-29120/bin" last_heartbeat:1702518842815183571 "]

Storage space usage of each TiKV node is as follows:

| username: WalterWj | Original post link

The store may not turn into a Tombstone, and there are no extra stores with enough space to accommodate the additional replica. Not enough space ↑

| username: DBRE | Original post link

What are the rules for region migration? How is it determined that there is not enough space?

Here, each node has several hundred GB of available space.

| username: zhanggame1 | Original post link

In theory, if all the leaders are gone, it can be taken down.

| username: 像风一样的男子 | Original post link

Is the disk on the server 10.10.10.10 full?

| username: DBRE | Original post link

No, this is the IP for scaling in TiKV.

| username: h5n1 | Original post link

Find 2 regions using pd-ctl region to check their status.

| username: DBRE | Original post link

The image is not visible. Please provide the text you need translated.

| username: tidb菜鸟一只 | Original post link

Have the space usage rates of the other nodes exceeded 80%?

| username: DBRE | Original post link

The disk usage of each TiKV node is shown in the figure below, with some exceeding 80%.


/dev/sdb 3.7T 625G 3.1T 17% /work
/dev/dfa 3.0T 2.2T 754G 75% /work
/dev/nvme0n1 3.5T 2.2T 1.4T 62% /work
/dev/nvme0n1 7.0T 6.1T 1002G 86% /work
/dev/nvme0n1 7.0T 6.1T 1002G 86% /work
/dev/nvme0n1 7.0T 6.1T 1002G 86% /work

Why is it 80%? What is the relationship between this and region scheduling? Is there any documentation related to region scheduling strategies?

| username: h5n1 | Original post link

pd-ctl config set increase space-ratio

| username: DBRE | Original post link

After cleaning up 86% of the space_placeholder_file files and then executing config set low-space-ratio 0.85, the remaining 15 regions were quickly migrated. :+1:

/dev/sdb 3.7T 625G 3.1T 17% /work
/dev/dfa 3.0T 2.2T 746G 75% /work
/dev/nvme0n1 3.5T 2.2T 1.4T 62% /work
/dev/nvme0n1 7.0T 5.7T 1.4T 81% /work
/dev/nvme0n1 7.0T 5.7T 1.4T 81% /work
/dev/nvme0n1 7.0T 5.7T 1.4T 81% /work

| username: Jellybean | Original post link

It is clearly stated here that the reason for this issue is that you want to decommission a node, and the data on this node needs to be migrated to other nodes, but the other nodes do not have enough space, which causes the problem.

To determine whether the other nodes have enough space, the cluster uses the low-space-ratio parameter. So, if you increase it, the cluster will consider the space sufficient again, allowing the previous scheduling operation to be executed.

This is a temporary solution. The fundamental solution is to expand storage or promptly clean up unnecessary data to free up space.

| username: DBRE | Original post link

One point of confusion is: there are 2 TiKV nodes with disk usage below 80%, but the remaining dozens of regions on the scale-in TiKV node are still not migrating. Does low-space-ratio mean that if any one TiKV node’s disk usage is above 80%, region migration stops?

/dev/dfa 3.0T 2.2T 746G 75% /work
/dev/nvme0n1 3.5T 2.2T 1.4T 62% /work (newly expanded TiKV)

| username: Jellybean | Original post link

No, nodes that have not reached this threshold should theoretically still be able to receive data from other nodes. However, you can’t just look at this one aspect; there are many other factors to consider in migration. The specific execution of cluster data migration scheduling actually involves the distribution of Leader or Region data, access hotspot conditions, remaining space, scheduling strategies, and other factors. It is the result of a combination of these factors.

| username: 普罗米修斯 | Original post link

Yes, the cluster should be comprehensively considered. After excluding the stores that cannot be migrated, there are still many limiting factors on the available stores, such as labels, data distribution, scheduling restrictions, etc.

| username: DBRE | Original post link

Yes, the number of regions on the nodes that haven’t reached this threshold (one is a newly expanded TiKV, and the other is a TiKV with 75% usage) is increasing. However, the regions on the scale-in node haven’t migrated. It is suspected that if these two TiKVs that haven’t reached the threshold rebalance, the regions on the scale-in node will migrate again. However, the regions on the scale-in node have already been migrated by setting the low-space-ratio to 0.85 through config, so it cannot be reproduced.

| username: zxgaa | Original post link

Is the disk usage reaching the threshold?

| username: dba远航 | Original post link

Adjust the space-ratio parameter

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.