TiDB Thread Hang

translator_bot · June 20, 2024, 1:06pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB线程卡死

| username: 飞翔的草莓

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.5.0
[Reproduction Path] Operations performed that led to the issue
SELECT * FROM INFORMATION_SCHEMA.CLUSTER_PROCESSLIST WHERE time > 300;
[Encountered Issue: Phenomenon and Impact]
Querying records where the execution time of threads is greater than 300ms, it was found that there was one SQL with a very long execution time, occupying 17GB of memory. It is unclear whether this occurred during dashboard query alerts or other pages. Executing the KILL statement could not terminate it, ultimately causing TiDB to become unstable and frequently interrupted.

The SQL is as follows:
SELECT FLOOR(UNIX_TIMESTAMP(MIN(summary_begin_time))) AS agg_begin_time, FLOOR(UNIX_TIMESTAMP(MAX(summary_end_time))) AS agg_end_time, ANY_VALUE(digest_text) AS agg_digest_text, ANY_VALUE(digest) AS agg_digest, SUM(exec_count) AS agg_exec_count, SUM(sum_latency) AS agg_sum_latency, MAX(max_latency) AS agg_max_latency, MIN(min_latency) AS agg_min_latency, CAST(SUM(exec_count * avg_latency) / SUM(exec_count) AS SIGNED) AS agg_avg_latency, ANY_VALUE(schema_name) AS agg_schema_name, COUNT(DISTINCT plan_digest) AS agg_plan_count FROM INFORMATION_SCHEMA.CLUSTER_STATEMENTS_SUMMARY_HISTORY WHERE summary_begin_time <= FROM_UNIXTIME(?) AND summary_end_time >= FROM_UNIXTIME(?) GROUP BY schema_name, digest ORDER BY agg_sum_latency DESC

translator_bot · June 20, 2024, 1:06pm

| username: zhaokede | Original post link

This SQL is a table of historical summary information within the system.

translator_bot · June 20, 2024, 1:06pm

| username: 飞翔的草莓 | Original post link

Under what operation will this SQL be executed?

translator_bot · June 20, 2024, 1:06pm

| username: 有猫万事足 | Original post link

Post the execution plan and let’s take a look.

translator_bot · June 20, 2024, 1:06pm

| username: zhaokede | Original post link

Monitoring tools like dashboards

translator_bot · June 20, 2024, 1:06pm

| username: zhaokede | Original post link

Or maybe you have scheduled some performance statistics analysis tasks in the dashboard.

translator_bot · June 20, 2024, 1:06pm

| username: 小龙虾爱大龙虾 | Original post link

Normally, this shouldn’t consume too many resources. This is just the SQL executed in the background when you click SQL statement analysis on the dashboard. You can capture a heap profile to see why it is taking up so much memory.

translator_bot · June 20, 2024, 1:06pm

| username: vincentLi | Original post link

I have also encountered similar situations. The system’s own queries should be quite efficient, but TiDB is somewhat lacking in this regard.

translator_bot · June 20, 2024, 1:06pm

| username: TiDBer_QKDdYGfz | Original post link

Is the query SQL not finishing execution?

translator_bot · June 20, 2024, 1:06pm

| username: TiDBer_jYQINSnf | Original post link

I encountered this once. During the execution of an SQL query, it ran out of memory and exceeded the memory quota. When it was killed, it caused a panic. No message was sent to the client, causing the client connection to wait indefinitely for a response, but in reality, the server never responded. It got stuck. This is a bug.

translator_bot · June 20, 2024, 1:06pm

| username: zhaokede | Original post link

The original poster is using version 7.5, and it’s unclear whether the new version 8.0 has resolved this issue.

translator_bot · June 20, 2024, 1:06pm

| username: FutureDB | Original post link

How was it killed?