GC Data Not Reclaimed

translator_bot · June 23, 2024, 5:21am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: GC数据不回收

| username: wakaka

[TiDB Usage Environment] Online
[TiDB Version] 5.0.6
[Encountered Problem]
The tidb log reports “gc work is too busy”. The gc safe time query is normal, but currently, data is not being reclaimed, leading to rapid data growth. I don’t know how to solve this issue.

However, each GC takes 34 minutes to 1 hour.

The business performs around 10 truncates every 5 minutes, but no unsafe destroy range is observed.

[Reproduction Path] What operations were performed that led to the problem
[Problem Phenomenon and Impact]

[Attachment]

translator_bot · June 23, 2024, 5:21am

| username: wuxiangdong | Original post link

tikv_gc_lift_time 24 hours is too long…

translator_bot · June 23, 2024, 5:21am

| username: wuxiangdong | Original post link

Setting tikv_gc_life_time to be slightly longer than the maximum query time is sufficient, around 1 hour should be fine.

translator_bot · June 23, 2024, 5:21am

| username: h5n1 | Original post link

Check the mysql.gc_delete_range and gc_delete_range_done tables, which record the progress of GC after truncate/drop. If there is no cleanup, it is likely that you have encountered a bug.
TiDB 节点大量[gc worker] delete range failed 报错信息 - #6，来自 h5n1 - TiDB 的问答社区…

translator_bot · June 23, 2024, 5:21am

| username: wakaka | Original post link

Is there any other way besides upgrading~~

translator_bot · June 23, 2024, 5:21am

| username: h5n1 | Original post link

Find time to manually compact using tikv-ctl
tikv-ctl --host tikv_ip:port compact -d kv -c write
tikv-ctl --host tikv_ip:port compact -d kv -c default
tikv-ctl --host tikv_ip:port compact -d kv -c write --bottommost force
tikv-ctl --host tikv_ip:port compact -d kv -c lock --bottommost force

translator_bot · June 23, 2024, 5:21am

| username: wakaka | Original post link

Could you please explain the purpose and approximate risks of this operation? I searched the documentation but it wasn’t explained in detail.

translator_bot · June 23, 2024, 5:21am

| username: h5n1 | Original post link

Execution time is long, and resource consumption is relatively high.

translator_bot · June 23, 2024, 5:21am

| username: alfred | Original post link

The data volume might be too large for GC to clean up.

translator_bot · June 23, 2024, 5:21am

| username: wakaka | Original post link

Will this recycle the previous GC data? How long will it take to execute approximately 2T of data on a single TiKV?

translator_bot · June 23, 2024, 5:21am

| username: wakaka | Original post link

Hmm, it looks like it can’t be cleaned up. Are you also using compact?

translator_bot · June 23, 2024, 5:21am

| username: h5n1 | Original post link

This depends on the system performance. You can try it during off-peak hours.

translator_bot · June 23, 2024, 5:21am

| username: wakaka | Original post link

There isn’t really a low peak; tasks are heavier in the evening, and data queries are more important in the morning! Maybe there’s only a low peak for 2 hours at noon, but I’m afraid it won’t be enough.

translator_bot · June 23, 2024, 5:21am

| username: h5n1 | Original post link

Let me run it in my environment and see the time.

translator_bot · June 23, 2024, 5:21am

| username: h5n1 | Original post link

I have completed running 5.4T on 5 TiKV SAS disks, with only inserts and no updates/deletes, so the space hasn’t been released. Below are the time consumption and resource usage details. Without specifying --threads, the disk IO is fine, but the RocksDB CPU utilization has increased.

translator_bot · June 23, 2024, 5:21am

| username: wakaka | Original post link

Thank you, master. It seems that your cluster took about 22 hours. Here are a few screenshots from my side. I’m worried about affecting the cluster. If it impacts the cluster during execution, can it be terminated?

translator_bot · June 23, 2024, 5:21am

| username: h5n1 | Original post link

You should just kill it. There’s a --threads option, which defaults to 8.

translator_bot · June 23, 2024, 5:21am

| username: wakaka | Original post link

Or, to put it another way, which version upgrade can solve this problem? Are there any additional risks with upgrading? The documentation says it’s just one command, and I’ve tried upgrading a small cluster. I’m not sure if there are any pitfalls or additional issues with a large cluster.

translator_bot · June 23, 2024, 5:21am

| username: h5n1 | Original post link

I don’t know exactly which version can solve it, you can compare the release date of your current version. I recommend 5.2.4.
Upgrading can refer to the following documentation:

translator_bot · June 23, 2024, 5:21am

| username: wakaka | Original post link

Okay, thank you~~