After upgrading txnLockFast to JDK 17, select statements are also getting locked. How can this be investigated?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: txnLockFast 开发升级jdk17后出现select也被锁住的情况 请问怎么排查

| username: tidb狂热爱好者

[TiDB Usage Environment] Production Environment / Test / Poc
[TiDB Version]
[Encountered Issues]
[Reproduction Path] Operations performed that led to the issue
[Issue Phenomenon and Impact]
txnLockFast

id task estRows operator info actRows execution info memory disk
Limit_10 root 1 offset:0, count:1 1 time:19.3s, loops:2 N/A N/A
└─TableReader_17 root 1 data:Limit_16 1 time:19.3s, loops:1, cop_task: {num: 2, max: 8.2ms, min: 1.86ms, avg: 5.03ms, p95: 8.2ms, max_proc_keys: 1, p95_proc_keys: 1, tot_proc: 2ms, rpc_num: 20, rpc_time: 30.8ms, copr_cache_hit_ratio: 0.00}, ResolveLock:{num_rpc:18, total_time:411.6ms}, backoff{txnLockFast: 18.9s} 656 Bytes N/A
└─Limit_16 cop[tikv] 1 offset:0, count:1 1 tikv_task:{proc max:1ms, min:1ms, p80:1ms, p95:1ms, iters:2, tasks:2}, scan_detail: {total_process_keys: 1, total_process_keys_size: 213, total_keys: 4, rocksdb: {delete_skipped_count: 3, key_skipped_count: 7, block: {cache_hit_count: 1, read_count: 19, read_byte: 396.2 KB}}} N/A N/A
└─TableFullScan_15 cop[tikv] 1 table:ex_trade, keep order:true, desc 1 tikv_task:{proc max:1ms, min:1ms, p80:1ms, p95:1ms, iters:2, tasks:2} N/A

[Attachment]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: xfworld | Original post link

What is this asking? I don’t understand :joy:

| username: tidb狂热爱好者 | Original post link

Under what circumstances will a select query be blocked by an update operation?

| username: xfworld | Original post link

The same select * from aa for update operation, when executed concurrently or consecutively without releasing the previous lock, will result in a lock.

Similarly, if select * from aa for update and insert into, update, etc. statements appear simultaneously, this situation will also occur.

| username: 人如其名 | Original post link

Your prewrite time for the update is too long, isn’t it? Try reducing the number of transactions. Also, are you using the RR isolation level? Can you switch to Pessimistic + RC to reduce read-write conflicts?

| username: tidb狂热爱好者 | Original post link

The main reason for the slow speed is that the SQL statement is not written well. You can use the EXPLAIN command to analyze the execution plan of the SQL statement and see if there are any full table scans or other inefficient operations. Additionally, you can check if there are any issues with the indexes.

| username: jansu-dev | Original post link

  1. txnLockFast indicates read-write conflicts in the cluster.
  2. You can read TiDB 锁冲突问题处理 | PingCAP 文档中心
  3. However, read-write conflicts are considered normal within TiDB if they occur at a low level, you can check kv backoff ops.
  4. If you want to directly trace back which two transactions are conflicting, you can try to increase the load, causing subsequent transactions to crash. The logs will reveal the start-ts of the two transactions, allowing you to analyze the business model and understand why the read-write conflict occurred.
  5. Alternatively, you can try enabling debug logs for analysis, though this is just a guess on my part; it seems like it should log the information.
| username: tidb狂热爱好者 | Original post link

Read-write conflicts in TiDB are still difficult to troubleshoot. The official documentation is also not good.

| username: jansu-dev | Original post link

Indeed, this problem exists. :+1:

Not looking at this issue, to elaborate:
I think the reason might be that TiDB has internal retries, and when the backoff hasn’t reached a certain level, the product doesn’t define this behavior as a problem (so more information is only available when the transaction crashes, etc.);
I think if you want the product to be fixed, after this issue is resolved, you can propose on GitHub where you think the error should be displayed and in what manner. The community is still open. And this is indeed a rather troublesome issue, PingCAP should be happy to adopt it; (but just complaining without constructive suggestions, even if the developers make some fixes, it may not be targeted at your specific issue)

PS: Just personal opinion

Looking at this issue:
Actually, TxnLockFast can still be traced, for example, after the above transaction crashes, you can backtrack the start-ts of the two transactions. Or in a POC environment, you can also enable general-log for case-by-case analysis.

| username: tidb狂热爱好者 | Original post link

The official documentation is not very detailed. Pessimistic transactions can be queried but disappear instantly. It’s quite difficult. Can any experts explain how to query optimistic transactions?