TiDB 7.1 Cluster Stuck, Reports "lock txn not found, lock has expired"

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 7.1集群卡住,报 lock txn not found, lock has expired

| username: TiDBer_yyy

[TiDB Usage Environment] Test/PoC
[TiDB Version] 7.1.0
[Reproduction Path] Unable to connect to the cluster for a long time
[Encountered Problem: Symptoms and Impact]
Unable to connect to the cluster for a long time, error reported. Restarting the cluster does not resolve the issue.

[2023/07/25 20:12:03.322 +08:00] [WARN] [lock_resolver.go:667] ["lock txn not found, lock has expired"] [CallerStartTs=0] ["lock str"="key: 7480000000000000285f7280000000012c8e03, primary: 7480000000000000165f698000000000000002038000000000005b42, txnStartTS: 443098245297340417, lockForUpdateTS:443098245572591655, minCommitTs:443098245572591656, ttl: 21090, type: PessimisticLock, UseAsyncCommit: false, txnSize: 0"]
[2023/07/25 20:12:03.656 +08:00] [INFO] [2pc.go:1195] ["send TxnHeartBeat"] [startTS=443098593398358050] [newTTL=151000]
[2023/07/25 20:12:04.332 +08:00] [WARN] [lock_resolver.go:667] ["lock txn not found, lock has expired"] [CallerStartTs=0] ["lock str"="key: 7480000000000000285f7280000000012c8e04, primary: 7480000000000000165f698000000000000002038000000000005b42, txnStartTS: 443098245297340417, lockForUpdateTS:443098245572591655, minCommitTs:443098245572591656, ttl: 21090, type: PessimisticLock, UseAsyncCommit: false, txnSize: 0"]

[Attachment: Screenshot/Log/Monitoring]
image

| username: tidb狂热爱好者 | Original post link

This is probably locked by a pessimistic lock. You just need to open the application.

| username: TiDBer_yyy | Original post link

What operation is this? Restarting the cluster still results in this error.

Currently unable to log in to the TiDB client.

| username: TiDBer_yyy | Original post link

After restarting the tidb-server, you can log in immediately.

  • Check that only analyze collects statistics.
    This table has 200,000 rows, and collecting statistics takes more than 1000 seconds. Not sure if it’s related.
| 8996164556044632577 | root | 127.0.0.1:23619 | select id,user,host,info,time from information_schema.processlist |    0 |
| 8996164556044632067 |      |                 | analyze table `stat`.`table_name1`                           | 1047 |
+---------------------+------+-----------------+-------------------------------------------------------------------+------+
  • Disable automatic statistics collection and restart tidb-server, it returns to normal.
set global tidb_enable_auto_analyze=0;
| username: tidb菜鸟一只 | Original post link

Let’s check if it is a system table. I suspect that an automatic statistics collection task triggered a bug, locking the system table and not releasing it, which is causing the login issue.

| username: TiDBer_yyy | Original post link

Hello,
We haven’t been able to reproduce the issue for now. Do you have a related bug list?

| username: tidb菜鸟一只 | Original post link

There is a similar one, but it’s version 5.3. This bug shouldn’t still exist in version 7.1, right…
Here’s another one: insert to mysql.stats_buckets yields error Data too long for column ‘lower_bound’ at row 1 · Issue #30925 · pingcap/tidb (github.com)

| username: TiDBer_yyy | Original post link

The error message is very similar.

Cluster working background and operation steps:

  1. Lightning imports data, and after 1 hour of import, it is interrupted with a timeout error when connecting to the database.

  2. After restarting the TiDB cluster, the connection to the database is still stuck.

  3. Quickly reconnect and log in after shutting down tidb-server, disable statistics collection, then restart to recover

| username: redgame | Original post link

It is usually caused by lock conflicts in TiKV.

| username: TiDBer_yyy | Original post link

How to locate it?

| username: Rilakkuma | Original post link

Did you perform any operations before being unable to connect to the cluster for a long time? Did you kill TiDB? It could be due to leftover locks from previous transactions. TiDB needs to wait for the wait-for-lock-timeout to clean up the locks. If there are many leftover locks, this time could be very long. You can wait for a round of GC to let GC clean up the locks, and then try again.

| username: TiDBer_yyy | Original post link

  1. I was importing data using Lightning, but the data import was interrupted.

  2. I didn’t directly kill TiDB; I restarted the TiDB server using tiup restart, and the restart process was very slow.

Connection error:
ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading initial communication packet’, system error: 104

| username: cassblanca | Original post link

Resolving lock conflicts in optimistic transactions Troubleshooting write conflicts in the optimistic transaction model | PingCAP Documentation Center
Hope this can help you.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.