TiDB 5.4.0 - SQL Stuck

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 5.4.0-sql卡住

| username: magongyong

【TiDB Usage Environment】Production Environment / Testing / Poc
【TiDB Version】5.4.0
【Encountered Problem】
Running the SQL once is fine, results come out in 2 seconds, but running it multiple times causes it to get stuck. The table data volume is not large, the largest table does not exceed 1 million, as shown in the picture below:

The SQL running in the production environment has been stuck for over 3 million seconds.

【Reproduction Path】Operations performed that led to the problem

【Problem Phenomenon and Impact】
SQL gets stuck, and the page query does not return results.

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: magongyong | Original post link

After testing, version 5.4.2 also has this issue.
The SQL contains both right join and left join. The first limit pagination query works fine, but the second one gets stuck.

| username: buddyyuan | Original post link

After the trace command, follow it with your SQL statement and post the results of the second execution for us to review.

| username: magongyong | Original post link

Execution fails, it directly freezes.

| username: buddyyuan | Original post link

Just use analyze to check the execution plan.

| username: magongyong | Original post link

| username: buddyyuan | Original post link

“EXPLAIN does not actually execute, it only views the execution plan. EXPLAIN ANALYZE actually executes and views the execution plan at the same time.”

| username: magongyong | Original post link

The “explain analyze” execution got stuck on the first run.

| username: magongyong | Original post link

It seems to be stuck during parsing.

| username: buddyyuan | Original post link

You directly analyze and send out the execution plan to have a look. It should be the execution plan that’s not working.

| username: magongyong | Original post link

When actually executing the SQL, the first time is very fast, it takes seconds, but the second time it gets stuck. It’s been over a month without finding out why. If the execution plan is not good, it would still produce a result, albeit slower. The current situation is that the second execution directly gets stuck and does not produce a result.

| username: magongyong | Original post link

Here is the execution plan

| username: forever | Original post link

What about collecting statistics?

| username: 人如其名 | Original post link

Brother, capture the goroutines and take a look. The capture method is:
curl http://:10080/debug/pprof/goroutine?debug=2 >goroutines.txt

Then upload it and take a look.

| username: magongyong | Original post link

goroutines.txt (1.7 MB)

| username: magongyong | Original post link

This is what I captured when my SQL execution got stuck. Please take a look, thanks.

| username: magongyong | Original post link

I found that the end of the SQL that got stuck when I executed it was limit 32000,4000, while in the show full processlist the end of the SQL was limit 28000,4000.

| username: magongyong | Original post link

Tried analyze, but it didn’t work.

| username: 人如其名 | Original post link

This is a BUG, refer to: indexHashJoin hang in handleTask · Issue #35638 · pingcap/tidb · GitHub
Fixed in 5.4.3, it is recommended to apply the patch and upgrade.

| username: tidb狂热爱好者 | Original post link

It should be a code issue.