Key is locked (will clean up) primary_lock in optimistic transactions

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 乐观事务下的Key is locked (will clean up) primary_lock

| username: Hacker_Yv76YjBL

[TiDB Usage Environment] Production Environment
[TiDB Version] V4.0.9
[Reproduction Path] Cluster jitter in optimistic transaction mode
[Encountered Problem: Phenomenon and Impact]
The business uses optimistic transactions for DML operations, and there is occasional database jitter every day. Through the logs, it is found that the key is locked, causing the SQL execution operation to be forcibly killed by the cluster after 3 seconds (MAX_EXECUTION_TIME is set to 3 seconds), leading to business jitter.
Question 1: The root cause of this situation has not been found.
Question 2: In optimistic transactions, if the txnLockFast keyword does not appear in the monitoring and logs, does it mean there is no read-write conflict?
Question 3: The csmessage table has primary keys and unique constraints, and there are 7 indexes and JSON type fields. The size of a single insert is about 5K. This insert takes time in the prewrite stage. If it is changed to a pessimistic transaction, will there be an improvement?


[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
8 TiDB Servers (16C64G) + 3 PD Servers (8C32G) + 15 TiKV (16C64G2T)
[Attachments: Screenshots/Logs/Monitoring]
TiKV Logs:
[2024/04/16 10:22:18.146 +08:00] [INFO] [apply.rs:1653] [“add learner successfully”] [region=“id: 3918269 start_key: 7480000000000000FF8D5F698000000000FF0000020167727079FF51337964FF693748FF6762550000FD0380FF0000000000000003FF8000000000000084FF038000000006F979FFD400000000000000F8 end_key: 7480000000000000FF8D5F698000000000FF0000020167727079FF575F3835FF716D49FF7844410000FD0380FF0000000000000003FF8000000000000000FF038000000004B48EFF5800000000000000F8 region_epoch { conf_ver: 4169 version: 523 } peers { id: 72440460 store_id: 49536096 } peers { id: 72562810 store_id: 49536098 } peers { id: 72570023 store_id: 22204 }”] [peer=“id: 72589510 store_id: 8992685 is_learner: true”] [peer_id=72562810] [region_id=3918269]
[2024/04/16 10:22:18.155 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000B75F69800000000000000101677270434941332DFF62355F624F340000FD0419B320258F09BC3003800000017B88469D lock_version: 449114049285981951 key: 7480000000000000B75F698000000000000007016664363036343162FF2D333665312D3433FF64632D626538312DFF3363393665333762FF6131366200000000FB03800000017B88469D lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:18.160 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 74800000000000008B5F728000000046F32FF1 lock_version: 449114049508802669 key: 74800000000000008B5F728000000046F32FF1 lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:18.193 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 74800000000000008B5F728000000046F32FF1 lock_version: 449114049508802669 key: 74800000000000008B5F728000000046F32FF1 lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:18.317 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 74800000000000008B5F72800000000048561E lock_version: 449114049836482580 key: 74800000000000008B5F72800000000048561E lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:18.408 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 74800000000000008B5F69800000000000000101677270694B4C3534FF4364367969550000FD038AD89484A6002000 lock_version: 449114049390839117 key: 74800000000000008B5F69800000000000000201677270694B4C3534FF4364367969550000FD03800000004BF8C87A lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:18.918 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000B75F69800000000000000101677270694E6B4676FF70514346656B0000FD0419B32025920C9A9003800000017B87D7BC lock_version: 449114050164165210 key: 7480000000000000B75F69800000000000000601677270694E6B4676FF70514346656B0000FD038000000000000050 lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:19.101 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000B75F698000000000000001016772706C4A4B6A63FF5475546B4C510000FD0419B320259303EBE803800000017B876A54 lock_version: 449114050190379217 key: 7480000000000000B75F698000000000000006016772706C4A4B6A63FF5475546B4C510000FD03800000000000001C lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:19.119 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000B75F698000000000000001016772706C4A4B6A63FF5475546B4C510000FD0419B320259303EBE803800000017B876A54 lock_version: 449114050190379217 key: 7480000000000000B75F698000000000000006016772706C4A4B6A63FF5475546B4C510000FD03800000000000001C lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:19.180 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000B75F698000000000000001016772707A7A713052FF48656B474A730000FD0419B32025930449A803800000017B88473D lock_version: 449114050242807795 key: 7480000000000000B75F698000000000000006016772707A7A713052FF48656B474A730000FD038000000000000008 lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:19.189 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000B75F698000000000000001016772705569655566FF596F79356F300000FD0419B3202593049F9803800000017B88ABED lock_version: 449114050242806941 key: 7480000000000000B75F698000000000000006016772705569655566FF596F79356F300000FD0380000000000003A4 lock_ttl: 3000 txn_size: 1”]
[2024/04/16 10:22:19.190 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Key is locked (will clean up) primary_lock: 74800000000000008B5F728000000034931AAA lock_version: 449114050190378030 key: 74800000000000008B5F728000000034931AAA lock_ttl: 3204 txn_size: 1”]
MOK Information





Monitoring Information






| username: oceanzhang | Original post link

It seems that there isn’t much of a problem with the SQL.

| username: WalterWj | Original post link

Question 1: The root cause of this situation has not been found.
Question 2: In optimistic transactions, if the txnLockFast keyword does not appear in the monitoring and logs, does it mean there is no read-write conflict?
Question 3: The csmessage table has primary keys and unique constraints, and there are 7 indexes and JSON type fields. The size of a single insert is about 5K. This kind of insert takes time in the prewrite phase. If it is changed to a pessimistic transaction, will there be an improvement?

In TiDB, the absence of the txnLockFast keyword in 2 is likely because you have configured a max execute time, and the SQL was killed before reaching the txnLockFast backoff recording logic. You can remove the execute time limit to observe and troubleshoot.
Adjusting to pessimistic transactions can be tried, but the benefits may not be significant.
Reducing the size of transactions, especially the number of write operations within a transaction, can effectively reduce conflicts and lock wait times.
Although TiDB 4.0 uses optimistic locking by default, in high-conflict scenarios, you might consider appropriate business logic adjustments or implementing retry logic at the application layer.

So, first remove the execute time limit and see if you can find relevant information in tidb.log, and optimize the business model. TiDB 锁冲突问题处理 | PingCAP 归档文档站

| username: tidb狂热爱好者 | Original post link

Here are a few suggestions for you:

  1. If you are pursuing high concurrency, it is recommended to split into two tables: a small table to record relationships and a large table to store JSON.
  2. For tables with high requirements, do not create too many indexes. TiDB itself is a distributed database with distributed indexes. Inserting without an index takes 2ms, with one index it takes 10ms. If there is a unique index and seven other indexes, the write time will directly go to 100ms.
    For necessary statistics and analysis, you can use TiFlash or save the data elsewhere for analysis.
| username: shigp_TIDBER | Original post link

Feel the author’s excitement.