After upgrading from 7.1.4 to 7.1.5, many transactions are suspended on one node

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 7.1.4升级到7.1.5后一个节点挂起很多事物

| username: Mingdr

[Test Environment] TiDB
[TiDB Version] 7.1.5
[Reproduction Path] Upgrade from 7.1.4 to 7.1.5
[Encountered Issue: Phenomenon and Impact]
Many transactions are suspended:


Most of them are on this node.
The tikv log of this node:

It doesn’t seem to be a problem. How can I determine what these connections are doing?

| username: ziptoam | Original post link

See if there are more detailed logs, and check who initiated the transaction.

| username: xfworld | Original post link

Compact, standard job tasks

But, what kind of machine is 192.168.0.241? Why are there a bunch of Sleep Sessions?

| username: Mingdr | Original post link

I haven’t noticed any anomalies in daily use, but there are always many connections. This is a test database, and I’m the only one using it:

| username: Mingdr | Original post link

This machine only serves as a KV node, it’s just very strange why it always has

| username: xfworld | Original post link

No big issue, compact is automatic.

What I’m curious about: why are there 4 TiKV instances? An even number?

| username: zhanggame1 | Original post link

TiKV instances can be an even number, the number of replicas is generally odd, and the number of TiKV instances should be greater than or equal to the number of replicas.

| username: Mingdr | Original post link

But why is only that one node compacting?

| username: xfworld | Original post link

Is it possible that the number of replicas is not configured properly, causing data skew?

TiKV compaction is quite common, and TiDB GC will trigger this event…
The event will drive all TiKV nodes to compact.

| username: Mingdr | Original post link

No way, isn’t it fine as long as it’s greater than three? And I’ve upgraded all the way from 7.1.0, and this has never happened before.

| username: WalterWj | Original post link

It feels like the connection is suspended. It is possible that the front-end connection has already been disconnected, meaning it connected but did nothing. This has nothing to do with the database. If it doesn’t work, try restarting the tidb-server to actively disconnect it, or use kill tidb session_id to terminate it.

| username: Mingdr | Original post link

It’s the same. I killed it first, and it immediately reconnected. Then I restarted the cluster, and there were still these connections.

| username: WalterWj | Original post link

That definitely has nothing to do with the database. This is the frontend creating the connection. It’s caused by long connections.

| username: Mingdr | Original post link

The question is why all the connections are generated by this node, which is a KV node.

| username: WalterWj | Original post link

Is there HAProxy or an application above? TiKV will not connect to the TiDB server.

| username: Mingdr | Original post link

Found the reason, deployed a DM service.