Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 一个tikv节点的region average written keys指数突然暴增,导致业务卡顿。
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v5.4.0 2tidb 3pd 4tikv 2ha
[Reproduction Path] The region average written keys index of one tikv node suddenly surged, causing business stuttering. What is the reason for this issue and how can it be avoided?
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]
Check the TopSQL for this KV at this time point on the dashboard to see if there are any slow queries.
I see many delete statements, and the execution time for one of them reaches around 8.1 seconds.
Large-scale data deletion tends to get slower as it progresses; it can be optimized.
If there are a large number of deletes, the health of the table may deteriorate, which can lead to query plan (CBO) optimization deviations. Simply put, incorrect sampling data can affect hits, causing the process to become increasingly slower. It is recommended to execute ANALYZE TABLE
to re-collect statistics and improve the table’s health after confirming that the deleted data has been garbage collected, preferably when the system is not busy. Large-scale data deletions are best performed during off-peak hours.
Refer to the answers to the previous questions for optimization?