KILL tidb +id not effective

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: KILL tidb +id不生效

| username: l940399478

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.1.4
[Encountered Problem] kill tidb +id not effective
[Problem Phenomenon and Impact]
High load on tikv nodes
Phenomenon as follows:

mysql> SELECT id, COMMAND, TIME, STATE, TxnStart FROM INFORMATION_SCHEMA.CLUSTER_PROCESSLIST where TIME>1000 and COMMAND!='Sleep'\G;
*************************** 1. row ***************************
      id: 45516433
 COMMAND: Query
    TIME: 145610
   STATE: autocommit
TxnStart: 10-10 17:55:31.808(436573498719076387)
*************************** 2. row ***************************
      id: 45517293
 COMMAND: Query
    TIME: 145492
   STATE: autocommit
TxnStart: 10-10 17:57:30.007(436573529704235183)
2 rows in set (0.01 sec)

ERROR: 
No query specified

mysql> kill tidb 45516433; kill tidb 45517293;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT id, COMMAND, TIME, STATE, TxnStart FROM INFORMATION_SCHEMA.CLUSTER_PROCESSLIST where TIME>1000 and COMMAND!='Sleep'\G;
*************************** 1. row ***************************
      id: 45516433
 COMMAND: Query
    TIME: 145668
   STATE: autocommit
TxnStart: 10-10 17:55:31.808(436573498719076387)
*************************** 2. row ***************************
      id: 45517293
 COMMAND: Query
    TIME: 145550
   STATE: autocommit
TxnStart: 10-10 17:57:30.007(436573529704235183)
2 rows in set (0.03 sec)

ERROR: 
No query specified
| username: polars | Original post link

Is TiDB multi-node? Do I need to connect to the node where the session is located to kill it?

| username: l940399478 | Original post link

The kill command was executed on the corresponding nodes, and some were successfully killed, but these two remain and can’t be killed no matter what. Additionally, the load on the TiKV nodes has been consistently high, indicating that they haven’t been killed. However, the client reports that the queries were canceled a long time ago.

| username: BraveChen | Original post link

Restarting the TiDB node can kill the process.

| username: BraveChen | Original post link

Can you post the SQL that you can’t kill?

| username: TiDBer_CEVsub | Original post link

Go to the corresponding node server to kill.

| username: l940399478 | Original post link

/* ApplicationName=DataGrip 2022.2.4 */ select a.userId, a.productId, a.groupId, b.assistantId from dwd_user_group_l a left join dwd_assistant_group_l b on a.productId = b.productId and a.groupId = b.groupId

where a.status = 0 and a.userId not like ‘qa%’ and a.userId not like ‘%test%’ order by a.joinGroupTime desc limit 100000

| username: l940399478 | Original post link

It was killed on the corresponding server.

| username: l940399478 | Original post link

What is the impact on the business if I restart one of the two TiDB nodes? Will there be a brief interruption?

| username: polars | Original post link

Requests on this node will be disconnected, so the application needs to be able to automatically reconnect. Generally, this can be handled by configuring a connection pool.

| username: BraveChen | Original post link

It’s not entirely accurate to say that the similarities and the issues we encountered are exactly the same, but they are 90% similar. For large limits, use EXPLAIN ANALYZE to check if there is an index hash join in the execution plan. If there is, use a hint to force the join method to be specified as an index join, which can bypass this issue and prevent zombie sessions from occurring.