What are the mitigation methods for blocking reads with key is locked (backoff or cleanup)?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 写阻塞读 key is locked (backoff or cleanup) ,有什么缓解的方法?

| username: erwadba

To improve efficiency, please provide the following information. A clear problem description will help resolve the issue faster:
[TiDB Usage Environment]
[Overview] Scenario + Problem Overview
Querying forum articles. It generally does not affect the application.
However, the application reports similar errors. Some queries are affected. Are there any mitigation methods?
Key is locked (will clean up) primary_lock: - TiDB - TiDB Q&A Community (asktug.com)

[2022/10/31 09:48:26.307 +08:00] [INFO] [conn.go:1069] ["command dispatched failed"] [conn=19166671] [connInfo="id:19166671, addr:10.205.238.139:37336 status:10, collation:utf8_general_ci, user:prod_xx"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="SELECT MIN(CheckPoint)-1,MAX(CheckPoint) FROM po_xxx_01 WHERE CheckPoint>3267009"] [txn_mode=PESSIMISTIC] [err="other error: key is locked (backoff or cleanup) primary_lock: 7480000000000000FF5F69800000000000000503800000000031D63E lock_version: 437041471491670093 key: 7480000000000000FF5F69800000000000000503800000000031D9C2 lock_ttl: 3000 txn_size: 2 use_async_commit: true min_commit_ts: 437041471491670095\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleCopResponse\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:913\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTaskOnce\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:755\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTask\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:645\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).run\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:382\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"]

[Background] Operations performed

[Phenomenon] Business and database phenomena

[Problem] Current issues encountered

[Business Impact]

[TiDB Version]

tidb_version(): Release Version: v5.3.0
Edition: Community
Git Commit Hash: 4a1b2e9fe5b5afb1068c56de47adb07098d768d6
Git Branch: heads/refs/tags/v5.3.0
UTC Build Time: 2021-11-24 13:32:39
GoVersion: go1.16.4
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)

[Application Software and Version]

[Attachments] Relevant logs and configuration information

  • TiUP Cluster Display information
  • TiUP Cluster Edit config information

Monitoring (https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana monitoring
  • TiDB Grafana monitoring
  • TiKV Grafana monitoring
  • PD Grafana monitoring
  • Corresponding module logs (including logs 1 hour before and after the issue)

If the question is related to performance optimization or troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output.

| username: xfworld | Original post link

The transaction execution timed out. Reducing the content of the transaction operations or executing them in segments can effectively mitigate this issue.

| username: 近墨者zyl | Original post link

Severe read-write conflicts in the business will cause this error.
key is locked indicates a read-write conflict where a read request encounters uncommitted data and needs to wait for it to be committed before reading. A small amount of this error has no impact on the business, but a large number of such errors indicates severe read-write conflicts in the business.

Solution: Switch to pessimistic locking mode, do not use optimistic locking.

| username: 近墨者zyl | Original post link

Check out this document, it will help you solve the problem: TiDB 锁冲突问题处理 | PingCAP 文档中心

| username: erwadba | Original post link

The pessimistic lock mode that has already been used.

| username: erwadba | Original post link

My personal guess for the probable cause is as follows:
It might be due to the following reasons. Currently, the cluster is set with set global tidb_replica_read='leader-and-follower'.
The follower node might still have information about the primary key, causing the issue. Check the tidb.log error log where storeAddr shows the follower’s address.
Modify the parameter to set global tidb_replica_read='leader' and restart the TiDB node. The alert disappears.

Error log error="other error: key is locked (backoff or cleanup)". The `storeAddr=xx.xx.xx.xx` shown is exactly the follower's address.
[2022/11/01 08:42:02.985 +08:00] [WARN] [coprocessor.go:914] ["other error"] [conn=8233] [txnStartTS=437063076535336968] [regionID=22127261] [storeAddr=10.150.xx.xx:20160] [error="other error: key is locked (backoff or cleanup) primary_lock: 7480000000000001515F698000000000000004038000000000432DF8 lock_version: 437063076522229789 key: 7480000000000001515F698000000000000004038000000000432DFC lock_ttl: 3008 txn_size: 2 use_async_commit: true min_commit_ts: 437063076522229790"]
[2022/11/01 08:42:03.006 +08:00] [INFO] [conn.go:1069] ["command dispatched failed"] [conn=8233] [connInfo="id:8233, addr:10.205.xx.xx:38718 status:10, collation:utf8_general_ci, user:prod_xx"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="SELECT MIN(CheckPoint)-1,MAX(CheckPoint) FROM po_xx_02 WHERE CheckPoint>4402683"] [txn_mode=PESSIMISTIC] [err="other error: key is locked (backoff or cleanup) primary_lock: 7480000000000001515F698000000000000004038000000000432DF8 lock_version: 437063076522229789 key: 7480000000000001515F698000000000000004038000000000432DFC lock_ttl: 3008 txn_size: 2 use_async_commit: true min_commit_ts: 437063076522229790\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleCopResponse\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:913\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTaskOnce\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:755\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTask\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:645\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).run\n\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:382\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"]
[2022/11/01 08:43:56.124 +08:00] [WARN] [coprocessor.go:914] ["other error"] [conn=8233] [txnStartTS=437063106183823366] [regionID=22083063] [storeAddr=10.150.xx.xx:20160] [error="other error: key is locked (backoff or cleanup) primary_lock: 7480000000000002585F69800000000000000203800000000024FD73 lock_version: 437063106183823365 key: 7480000000000002585F69800000000000000203800000000024FDA7 lock_ttl: 3003 txn_size: 3 use_async_commit: true min_commit_ts: 437063106183823366"]
[2022/11/01 08:43:56.124 +08:00] [WARN] [coprocessor.go:914] ["other error"] [conn=8233] [txnStartTS=437063106183823366] [regionID=22083063] [storeAddr=10.150.xx.xx:20160] [error="other error: key is locked (backoff or cleanup) primary_lock: 7480000000000002585F69800000000000000203800000000024FD73 lock_version: 437063106183823365 key: 7480000000000002585F69800000000000000203800000000024FDA7 lock_ttl: 3003 txn_size: 3 use_async_commit: true min_commit_ts: 437063106183823366"]
| username: h5n1 | Original post link

The resolve lock occurs on the leader and synchronizes the cleanup logs through Raft. During GC, it will resolve lock to clean up locks before the GC safepoint. If there is access on the leader and it finds that the secondary lock has not been cleaned up, it will also resolve lock. It is speculated that if the secondary lock on the leader has not been accessed for a long time, it may lead to untimely processing and thus cannot be synchronized to the follower. For real-time TP business, it is better to access the leader, while for non-real-time tasks such as data extraction, you can consider setting up a separate TiDB to query the follower.

| username: 胡杨树旁 | Original post link

Didn’t it say that the MVCC mechanism ensures that reads and writes do not conflict? How can there be a read-write conflict here?

| username: Raymond | Original post link

In RR mode, to prevent phantom reads, there will be read-write conflicts. RC mode will not have read-write conflicts.

| username: erwadba | Original post link

I originally wanted to alleviate the pressure on the leader node through parameter settings. It seems that this is not feasible. Looking at the latest code, the parameter tidb_replica_read has multiple options (closest-replicas/closest-adaptive). I guess this issue might still exist.

| username: alfred | Original post link

Follower Read | PingCAP 文档中心 is under the SI isolation level

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.