To improve efficiency, please provide the following information. A clear problem description will help resolve the issue faster:
[TiDB Usage Environment]
[Overview] Scenario + Problem Overview
Querying forum articles. It generally does not affect the application.
However, the application reports similar errors. Some queries are affected. Are there any mitigation methods? Key is locked (will clean up) primary_lock: - TiDB - TiDB Q&A Community (asktug.com)
Corresponding module logs (including logs 1 hour before and after the issue)
If the question is related to performance optimization or troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output.
The transaction execution timed out. Reducing the content of the transaction operations or executing them in segments can effectively mitigate this issue.
Severe read-write conflicts in the business will cause this error. key is locked indicates a read-write conflict where a read request encounters uncommitted data and needs to wait for it to be committed before reading. A small amount of this error has no impact on the business, but a large number of such errors indicates severe read-write conflicts in the business.
Solution: Switch to pessimistic locking mode, do not use optimistic locking.
My personal guess for the probable cause is as follows:
It might be due to the following reasons. Currently, the cluster is set with set global tidb_replica_read='leader-and-follower'.
The follower node might still have information about the primary key, causing the issue. Check the tidb.log error log where storeAddr shows the follower’s address.
Modify the parameter to set global tidb_replica_read='leader' and restart the TiDB node. The alert disappears.
The resolve lock occurs on the leader and synchronizes the cleanup logs through Raft. During GC, it will resolve lock to clean up locks before the GC safepoint. If there is access on the leader and it finds that the secondary lock has not been cleaned up, it will also resolve lock. It is speculated that if the secondary lock on the leader has not been accessed for a long time, it may lead to untimely processing and thus cannot be synchronized to the follower. For real-time TP business, it is better to access the leader, while for non-real-time tasks such as data extraction, you can consider setting up a separate TiDB to query the follower.
I originally wanted to alleviate the pressure on the leader node through parameter settings. It seems that this is not feasible. Looking at the latest code, the parameter tidb_replica_read has multiple options (closest-replicas/closest-adaptive). I guess this issue might still exist.