Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: txnLockFast 开发升级jdk17后出现select也被锁住的情况 请问怎么排查
[TiDB Usage Environment] Production Environment / Test / Poc
[TiDB Version]
[Encountered Issues]
[Reproduction Path] Operations performed that led to the issue
[Issue Phenomenon and Impact]
txnLockFast
|
id |
task |
estRows |
operator info |
actRows |
execution info |
memory |
disk |
|
Limit_10 |
root |
1 |
offset:0, count:1 |
1 |
time:19.3s, loops:2 |
N/A |
N/A |
|
└─TableReader_17 |
root |
1 |
data:Limit_16 |
1 |
time:19.3s, loops:1, cop_task: {num: 2, max: 8.2ms, min: 1.86ms, avg: 5.03ms, p95: 8.2ms, max_proc_keys: 1, p95_proc_keys: 1, tot_proc: 2ms, rpc_num: 20, rpc_time: 30.8ms, copr_cache_hit_ratio: 0.00}, ResolveLock:{num_rpc:18, total_time:411.6ms}, backoff{txnLockFast: 18.9s} |
656 Bytes |
N/A |
|
└─Limit_16 |
cop[tikv] |
1 |
offset:0, count:1 |
1 |
tikv_task:{proc max:1ms, min:1ms, p80:1ms, p95:1ms, iters:2, tasks:2}, scan_detail: {total_process_keys: 1, total_process_keys_size: 213, total_keys: 4, rocksdb: {delete_skipped_count: 3, key_skipped_count: 7, block: {cache_hit_count: 1, read_count: 19, read_byte: 396.2 KB}}} |
N/A |
N/A |
|
└─TableFullScan_15 |
cop[tikv] |
1 |
table:ex_trade, keep order:true, desc |
1 |
tikv_task:{proc max:1ms, min:1ms, p80:1ms, p95:1ms, iters:2, tasks:2} |
N/A |
|
[Attachment]
Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.
What is this asking? I don’t understand
Under what circumstances will a select query be blocked by an update operation?
The same select * from aa for update
operation, when executed concurrently or consecutively without releasing the previous lock, will result in a lock.
Similarly, if select * from aa for update
and insert into
, update
, etc. statements appear simultaneously, this situation will also occur.
Your prewrite time for the update is too long, isn’t it? Try reducing the number of transactions. Also, are you using the RR isolation level? Can you switch to Pessimistic + RC to reduce read-write conflicts?
The main reason for the slow speed is that the SQL statement is not written well. You can use the EXPLAIN command to analyze the execution plan of the SQL statement and see if there are any full table scans or other inefficient operations. Additionally, you can check if there are any issues with the indexes.
Read-write conflicts in TiDB are still difficult to troubleshoot. The official documentation is also not good.
Indeed, this problem exists.
Not looking at this issue, to elaborate:
I think the reason might be that TiDB has internal retries, and when the backoff hasn’t reached a certain level, the product doesn’t define this behavior as a problem (so more information is only available when the transaction crashes, etc.);
I think if you want the product to be fixed, after this issue is resolved, you can propose on GitHub where you think the error should be displayed and in what manner. The community is still open. And this is indeed a rather troublesome issue, PingCAP should be happy to adopt it; (but just complaining without constructive suggestions, even if the developers make some fixes, it may not be targeted at your specific issue)
PS: Just personal opinion
Looking at this issue:
Actually, TxnLockFast can still be traced, for example, after the above transaction crashes, you can backtrack the start-ts of the two transactions. Or in a POC environment, you can also enable general-log for case-by-case analysis.
The official documentation is not very detailed. Pessimistic transactions can be queried but disappear instantly. It’s quite difficult. Can any experts explain how to query optimistic transactions?