I have an SQL query that, when executed, causes the TiFlash thread count to explode.
The data volume is not large, so I suspect it might have triggered a bug.
I noticed a resolved issue in version 6.1.1 that is similar to my problem:
However, after upgrading to 6.1.1, the problem still exists, as shown in the image below:
The SQL is quite complex and cannot be simplified (simplifying it does not reproduce the issue).
Moreover, it is related to the data; the same SQL does not cause this issue in a test environment with different data.
Sorry for the late response.
The explain result is in the attachment. The explain analyze couldn’t produce a result, reporting an error of thread resource exhaustion.
Hello, thank you for the feedback. Could you please send the coprocessor panel in the tiflash summary monitoring when the error thread is exhausted? You can use the PingCAP MetricsTool to export all the monitoring data of the tiflash summary and tiflash proxy details around the time when the issue occurred.
I have some additional information, sorry for the delay, the cluster just finished the stress test and became available.
Found more interesting information:
After the stress test, without touching the cluster, running the previously problematic SQL does not produce errors anymore. The explain analyze result is as follows: explain_analyze.txt (291.0 KB)
Restarting TiFlash and running again, still no errors.
Restarting the entire cluster and running again, the error appears, with 2 nodes using over 5k threads without releasing them.
Screenshots and metrics are as follows: