The region average written keys metric of a TiKV node suddenly surged, causing business latency

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 一个tikv节点的region average written keys指数突然暴增,导致业务卡顿。

| username: TiDBer_Y2d2kiJh

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v5.4.0 2tidb 3pd 4tikv 2ha
[Reproduction Path] The region average written keys index of one tikv node suddenly surged, causing business stuttering. What is the reason for this issue and how can it be avoided?
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]

| username: 像风一样的男子 | Original post link

Check the TopSQL for this KV at this time point on the dashboard to see if there are any slow queries.

| username: TiDBer_Y2d2kiJh | Original post link

I see many delete statements, and the execution time for one of them reaches around 8.1 seconds.

| username: 像风一样的男子 | Original post link

Large-scale data deletion tends to get slower as it progresses; it can be optimized.

| username: zhanggame1 | Original post link

How to optimize?

| username: 像风一样的男子 | Original post link

If there are a large number of deletes, the health of the table may deteriorate, which can lead to query plan (CBO) optimization deviations. Simply put, incorrect sampling data can affect hits, causing the process to become increasingly slower. It is recommended to execute ANALYZE TABLE to re-collect statistics and improve the table’s health after confirming that the deleted data has been garbage collected, preferably when the system is not busy. Large-scale data deletions are best performed during off-peak hours.

| username: 昵称想不起来了 | Original post link

Refer to the answers to the previous questions for optimization?