Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TIKV GC繁忙出见问题

GC is busy and access keeps timing out. How to solve this?
Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TIKV GC繁忙出见问题
GC is busy and access keeps timing out. How to solve this?
Didn’t set it up. What should I do in this situation? The cluster keeps timing out.
【TiDB Usage Environment】Production Environment / Testing / POC
【TiDB Version】
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Issue Phenomenon and Impact】
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots / Logs / Monitoring】
Describing the issue as required will help you better pinpoint the problem~
gc.enable-compaction-filter
and restart the cluster.Problem Description:
During the period when TiKV GC worker CPU usage is at 100%, executing drop table or truncate table commands may encounter the issue where TiKV space is not reclaimed after deleting the table. Even after the GC worker CPU usage decreases, subsequent drop table or truncate table operations still do not reclaim space.
GitHub issue: False GcWorkerTooBusy caused by incorrect scheduled_tasks · Issue #11903 · tikv/tikv · GitHub
Affected Versions:
v5.0.6, v5.1.3, v5.2.3, v5.3.0
Troubleshooting Steps:
Cause of the Problem:
The drop table and truncate table commands in TiDB send unsafe destroy range requests to TiKV to delete a range of data.
When the TiKV GC worker is busy, the number of pending tasks for the GC worker may reach its limit. At this time, if unsafe destroy range tasks are added, the task counter may incorrectly increase but not decrease.
After multiple such operations, the value of this counter will permanently exceed the busy threshold of the GC worker. Subsequently, all unsafe destroy range requests will be rejected by TiKV, causing the drop/truncate table operations to fail in deleting data.
Workarounds:
Fixed Versions:
v5.0.7, v5.1.4, v5.3.1, v5.4.0
Bugfix PR: https://github.com/tikv/tikv/pull/11904
Is it a production environment? If it prompts that the worker is busy, it is recommended to check the resource usage at the operating system level.