TiDB Thread Hang

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB线程卡死

| username: 飞翔的草莓

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.5.0
[Reproduction Path] Operations performed that led to the issue
SELECT * FROM INFORMATION_SCHEMA.CLUSTER_PROCESSLIST WHERE time > 300;
[Encountered Issue: Phenomenon and Impact]
Querying records where the execution time of threads is greater than 300ms, it was found that there was one SQL with a very long execution time, occupying 17GB of memory. It is unclear whether this occurred during dashboard query alerts or other pages. Executing the KILL statement could not terminate it, ultimately causing TiDB to become unstable and frequently interrupted.

The SQL is as follows:
SELECT FLOOR(UNIX_TIMESTAMP(MIN(summary_begin_time))) AS agg_begin_time, FLOOR(UNIX_TIMESTAMP(MAX(summary_end_time))) AS agg_end_time, ANY_VALUE(digest_text) AS agg_digest_text, ANY_VALUE(digest) AS agg_digest, SUM(exec_count) AS agg_exec_count, SUM(sum_latency) AS agg_sum_latency, MAX(max_latency) AS agg_max_latency, MIN(min_latency) AS agg_min_latency, CAST(SUM(exec_count * avg_latency) / SUM(exec_count) AS SIGNED) AS agg_avg_latency, ANY_VALUE(schema_name) AS agg_schema_name, COUNT(DISTINCT plan_digest) AS agg_plan_count FROM INFORMATION_SCHEMA.CLUSTER_STATEMENTS_SUMMARY_HISTORY WHERE summary_begin_time <= FROM_UNIXTIME(?) AND summary_end_time >= FROM_UNIXTIME(?) GROUP BY schema_name, digest ORDER BY agg_sum_latency DESC

| username: zhaokede | Original post link

This SQL is a table of historical summary information within the system.

| username: 飞翔的草莓 | Original post link

Under what operation will this SQL be executed?

| username: 有猫万事足 | Original post link

Post the execution plan and let’s take a look.

| username: zhaokede | Original post link

Monitoring tools like dashboards

| username: zhaokede | Original post link

Or maybe you have scheduled some performance statistics analysis tasks in the dashboard.

| username: 小龙虾爱大龙虾 | Original post link

Normally, this shouldn’t consume too many resources. This is just the SQL executed in the background when you click SQL statement analysis on the dashboard. You can capture a heap profile to see why it is taking up so much memory.

| username: vincentLi | Original post link

I have also encountered similar situations. The system’s own queries should be quite efficient, but TiDB is somewhat lacking in this regard.

| username: TiDBer_QKDdYGfz | Original post link

Is the query SQL not finishing execution?

| username: TiDBer_jYQINSnf | Original post link

I encountered this once. During the execution of an SQL query, it ran out of memory and exceeded the memory quota. When it was killed, it caused a panic. No message was sent to the client, causing the client connection to wait indefinitely for a response, but in reality, the server never responded. It got stuck. This is a bug.

| username: zhaokede | Original post link

The original poster is using version 7.5, and it’s unclear whether the new version 8.0 has resolved this issue.

| username: FutureDB | Original post link

How was it killed?