TIDB analyze table is stuck and remains in running state

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB analyze table 卡住, 一直处于运行中

| username: jiangdj1992

[TiDB Usage Environment] Production Environment
[TiDB Version] 4.0.8
[Reproduction Path] analyze table table_name;
[Encountered Problem: Phenomenon and Impact]
Execution is continuously running and stuck, logs show that the execution SQL cost is relatively high.
[2024/04/29 15:51:35.217 +08:00] [WARN] [expensivequery.go:168] [expensive_query] [cost_time=60.066512705s] [conn_id=163533] [user=root] [database=teaching_study_muster_wb] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“analyze table user_learntime”]

Table row count: 23,000 rows
Check status:
±-------------------------±-----------------±---------------±------------------------±---------------±--------------------±--------+
| Table_schema | Table_name | Partition_name | Job_info | Processed_rows | Start_time | State |
±-------------------------±-----------------±---------------±------------------------±---------------±--------------------±--------+
| teaching | user_learntime | | analyze index IX_userId | 23165 | 2024-04-29 15:50:35 | running |
| teaching | user_learntime | | analyze columns | 23165 | 2024-04-29 15:50:35 | running |
| dba | killed_sql_table | | analyze columns | 0 | 2024-04-29 16:24:26 | running |

Cluster Information: 6 TiKV, 3 PD, 3 TiDB

| username: h5n1 | Original post link

Is it because there are too many MVCC versions in the table? Try running an EXPLAIN ANALYZE on a full table scan SQL to check.

| username: hacker_77powerful | Original post link

This table has only 23,000 rows, and it got stuck. I guess the version is too old.

| username: jiangdj1992 | Original post link

How should this be handled? Tables with 0 rows are all stuck.

| username: jiangdj1992 | Original post link

Version 4.0.8 has not been upgraded.

| username: h5n1 | Original post link

Show table xxx regions or check tikv_region_status to see how many regions this table has.

| username: jiangdj1992 | Original post link

mysql> explain select * from user_learntime ;
±----------------------±---------±----------±---------------------±---------------------+
| id | estRows | task | access object | operator info |
±----------------------±---------±----------±---------------------±---------------------+
| TableReader_5 | 23165.00 | root | | data:TableFullScan_4 |
| └─TableFullScan_4 | 23165.00 | cop[tikv] | table:user_learntime | keep order:false |
±----------------------±---------±----------±---------------------±---------------------+
2 rows in set (0.00 sec)

| username: jiangdj1992 | Original post link

mysql> show table user_learntime regions;
±----------±----------±--------±----------±----------------±---------------------------±-----------±--------------±-----------±---------------------±-----------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
±----------±----------±--------±----------±----------------±---------------------------±-----------±--------------±-----------±---------------------±-----------------+
| 285209 | t_984_ | t_986_ | 3813789 | 10 | 3813789, 3866620, 10958513 | 0 | 1683 | 0 | 3 | 40960 |
±----------±----------±--------±----------±----------------±---------------------------±-----------±--------------±-----------±---------------------±-----------------+
1 row in set (0.01 sec)

| username: jiangdj1992 | Original post link

It feels like there’s an issue with the cluster. Every table gets stuck, not just a specific one. The logs don’t show any problems, and the monitoring seems normal as well.

| username: h5n1 | Original post link

Can you see the status of each component? This version probably doesn’t have tiup, right?

| username: DBAER | Original post link

Is it possible that this parameter was adjusted, causing frequent analyze operations?

| username: 裤衩儿飞上天 | Original post link

I also encountered this in version 4.0.8. I had tens of thousands of rows, and the analyze couldn’t complete.
Check if your auto analyze has also been stopped for a long time.

I restarted all the TiDB servers, and then it worked. It seems that one of the TiDB servers was stuck.
There were no error reports, so the issue couldn’t be investigated.

If it doesn’t work, just upgrade~ :smiling_imp:

| username: jiangdj1992 | Original post link

Starting from version 4.0, TiUP is available, and all components are in normal status.

| username: jiangdj1992 | Original post link

Indeed, adjustments have been made. Both the time and threshold have been adjusted, from 0.5 to 0.3, and the time limit range has been restricted.

| username: jiangdj1992 | Original post link

What you said is indeed the case. I haven’t analyzed it for a long time. I have a SQL query that should be quick but takes a particularly long time to execute. Although it shows as executing, it appears as sleep status in the process list, and the TiDB node becomes unresponsive. Later, I killed all the sleeping processes, and then it stopped being unresponsive, but that SQL query still couldn’t complete.

| username: WalterWj | Original post link

I recommend upgrading. It seems like some bug might have been triggered. It shouldn’t be stuck.

This version is already EOL.

| username: TiDBer_21wZg5fm | Original post link

Insufficient server resources?

| username: zhanggame1 | Original post link

Earlier versions had many bugs.

| username: jiangdj1992 | Original post link

The restart has returned to normal, and the analysis has become much faster.