After a transaction submission timeout of 41 seconds, an anomaly appears in the cleanup_secondary_failure_rollback metric in the TiDB monitoring Tikv_Errors at the same time

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 事物提交超时41s后,Tidb监控Tikv_Erorrs中cleanup_secondary_failure_rollback同一时刻出现异常

| username: ArtisanChou

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]


The above image shows TiDB monitoring Tikv_Errors


The above image shows the most recent transaction commit exception. Although the error reason is PD timeout (this error is reported every time), the PD connection is normal.


The above image shows PD connection monitoring.

Question: How to resolve this type of transaction commit timeout error?
Currently, the configuration is known to have tikv_client.commit_timeout=41s in tidb_config, and the transaction lock time innodb_lock_wait_timeout is set to the default 50s. If these two configurations are changed to extend the timeout period, will it be effective? Is the cleanup_secondary_failure_rollback exception caused by GC?

| username: tidb狂热爱好者 | Original post link

You have slow SQL, and KV is constantly retrying.

| username: ArtisanChou | Original post link

How to locate which TiKV is retrying? Can it be identified through monitoring? Can restarting the machine solve this kind of problem?

| username: Billmay表妹 | Original post link

Try to upgrade to a higher version as much as possible. Most people no longer use the lower versions. Many issues cannot be reproduced without the same version, which will affect your efficiency in solving problems. Additionally, many issues in the lower versions have already been resolved in the higher versions. Therefore, the answer to many of your queries might simply be: it’s a bug, and upgrading to a higher version will solve the problem.