Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TIDB线程卡死
[TiDB Usage Environment] Production Environment
[TiDB Version] 7.5.0
[Reproduction Path] Operations performed that led to the issue
SELECT * FROM INFORMATION_SCHEMA.CLUSTER_PROCESSLIST WHERE time > 300;
[Encountered Issue: Phenomenon and Impact]
Querying records where the execution time of threads is greater than 300ms, it was found that there was one SQL with a very long execution time, occupying 17GB of memory. It is unclear whether this occurred during dashboard query alerts or other pages. Executing the KILL statement could not terminate it, ultimately causing TiDB to become unstable and frequently interrupted.
The SQL is as follows:
SELECT FLOOR(UNIX_TIMESTAMP(MIN(summary_begin_time))) AS agg_begin_time, FLOOR(UNIX_TIMESTAMP(MAX(summary_end_time))) AS agg_end_time, ANY_VALUE(digest_text) AS agg_digest_text, ANY_VALUE(digest) AS agg_digest, SUM(exec_count) AS agg_exec_count, SUM(sum_latency) AS agg_sum_latency, MAX(max_latency) AS agg_max_latency, MIN(min_latency) AS agg_min_latency, CAST(SUM(exec_count * avg_latency) / SUM(exec_count) AS SIGNED) AS agg_avg_latency, ANY_VALUE(schema_name) AS agg_schema_name, COUNT(DISTINCT plan_digest) AS agg_plan_count FROM INFORMATION_SCHEMA
.CLUSTER_STATEMENTS_SUMMARY_HISTORY
WHERE summary_begin_time <= FROM_UNIXTIME(?) AND summary_end_time >= FROM_UNIXTIME(?) GROUP BY schema_name, digest ORDER BY agg_sum_latency DESC