A single TiKV region reaches 200GB, making it impossible to perform migration scheduling or split the region

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv单个region达到200G,无法进行迁移调度,也发split region

| username: tug_twf

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 5.1.4
[Reproduction Path] What operations were performed that caused the issue
[Encountered Issue]
Our cluster has more than 40 TiKV nodes. One TiKV node went down, and we performed a decommission operation. However, we found that one region replica could not be scheduled to other nodes. After manually moving the region through PD, PD scheduled the replica to another normal TiKV node, but the status remained pending, meaning it was unavailable.

We discovered that this region reached an astonishing 200GB.

We have set coprocessor.region-split-size to 96M.

Attempting a simple split region operation resulted in a timeout error.

After the region was scheduled, it remained in a pending state. We suspect that the region is too large to be scheduled. From the TiKV logs, we can see attempts to split the region, but there are errors during the split.

GC is set to 10 minutes. Due to high GC pressure (a lot of deletions), it takes almost a day to complete a GC cycle.

| username: 有猫万事足 | Original post link

You can try manual compaction.

Additionally, you can also check out this article.

| username: tug_twf | Original post link

The reason for trying manual compaction is what? This should not be related to the underlying RocksDB compaction, right? And this has been going on for several days. Or are you suspecting that there are many tombstones?

| username: 有猫万事足 | Original post link

Take a look at this

It depends on how the Key is deleted. If it is a delete operation, the key will still be scanned before compaction, unless it is a drop/truncate table which does not need to wait for compaction.

Currently known:

  1. Your GC is already lagging. The reason is unclear; it could be due to the large number of deletions you mentioned, which cannot be completed. It could also be other reasons causing the GC to get stuck (br, tispark, ticdc could all potentially cause GC to get stuck). There is currently no further information available.
    It would be best to check the result of tiup ctl:v{version number} pd service-gc-safepoint.
  2. You also mentioned that there are a lot of deletions. It is certain that the space deleted by the delete statement will only be reclaimed during compaction.

So, considering both points, manual compaction is a direction worth trying.

| username: Fly-bird | Original post link

Is it possible to force a logout?

| username: tug_twf | Original post link

Previously, I forcibly deleted this region, but the issue still persists when replicating on other nodes.

| username: tug_twf | Original post link

From the RocksDB logs, it appears that the node has performed compaction. However, we can manually trigger a compaction during off-peak business hours to see the effect.

| username: h5n1 | Original post link

Check the region information:

tikv-ctl --host TIKV_IP:20160 region-properties -r REGION_ID
| username: 像风一样的男子 | Original post link

Could you please check the kv logs for any error messages? Look for keywords like “panic”.

| username: andone | Original post link

tikv-ctl --host TIKV_IP:20160 region-properties -r REGION_ID

| username: Kongdom | Original post link

Does manually triggering compact have any effect?

| username: tug_twf | Original post link

There was no effect. In the end, I re-exported the table (fortunately, the table was small) and then renamed the table.

| username: Kongdom | Original post link

:thinking: It’s also a solution. Remember to mark your own answer as the best.