Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: ticdc的min resolved ts有较长的滞后

[TiDB Usage Environment] Production environment, 4 TiKV nodes
[TiDB Version] TiKV 6.1
[Reproduction Path]
Listen to CDC, and based on the CDC content, use txnkv to write to TiKV.
Perform range delete operations simultaneously with the above operations.
[Encountered Issue: Phenomenon and Impact]
When txn writes to TiKV, the client reports an error:
[ERROR] [commit.go:182] [“2PC failed commit key after primary key committed”] [error=“Error(Txn(Error(Mvcc(Error(TxnLockNotFound { start_ts: TimeStamp(439331133747363841), commit_ts: TimeStamp(439331134009508037), key: [0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 88, 195, 0, 0, 1, 132, 128, 217, 196, 16, 151, 5, 146, 83, 118, 61, 0, 0] })))))”] [errorVerbose=“Error(Txn(Error(Mvcc(Error(TxnLockNotFound { start_ts: TimeStamp(439331133747363841), commit_ts: TimeStamp(439331134009508037), key: [0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 88, 195, 0, 0, 1, 132, 128, 217, 196, 16, 151, 5, 146, 83, 118, 61, 0, 0] })))))\ngithub.com/tikv/client-go/v2/error.ExtractKeyErr\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220531092439-efebaeb9fe53/error/error.go:259\ngithub.com/tikv/client-go/v2/txnkv/transaction.actionCommit.handleSingleBatch\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220531092439-efebaeb9fe53/txnkv/transaction/commit.go:171\ngithub.com/tikv/client-go/v2/txnkv/transaction.(*batchExecutor).startWorker.func1\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220531092439-efebaeb9fe53/txnkv/transaction/2pc.go:1993\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571”] [txnStartTS=439331133747363841] [commitTS=439331134009508037] [keys=“[000000000000000b00004e1900000186345e80b0fee6f0a456410000,000000000000000b000058c30000018480d9c41097059253763d0000,000000000000000b00005a1f000001863424e69897059253762c0000]”] [stack=“github.com/tikv/client-go/v2/txnkv/transaction.actionCommit.handleSingleBatch\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220531092439-efebaeb9fe53/txnkv/transaction/commit.go:182\ngithub.com/tikv/client-go/v2/txnkv/transaction.(*batchExecutor).startWorker.func1\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220531092439-efebaeb9fe53/txnkv/transaction/2pc.go:1993”]
From this point on, the service starts to fail to listen to CDC information. Upon checking the monitoring, it was found that the min resolved ts of a certain machine was always lagging and not changing.
Looking at the Golang TiKV client code, it seems that the error “2PC failed commit key after primary key committed” might be a very serious bug?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
ticdc Monitoring
The green part in the picture is the one that was always lagging, and it recovered by itself later.
tikv Related Logs
[INFO] [commit.rs:67] [“txn conflict (lock not found)”] [commit_ts=439331134009508037] [start_ts=439331133747363841] [key=000000000000000BFF000058C300000184FF80D9C41097059253FF763D000000000000FB]
[WARN] [errors.rs:339] [“txn conflicts”] [err=“Error(Txn(Error(Mvcc(Error(TxnLockNotFound { start_ts: TimeStamp(439331133747363841), commit_ts: TimeStamp(439331134009508037), key: [0, 0, 0, 0, 0, 0, 0, 11, 0, 0, 88, 195, 0, 0, 1, 132, 128, 217, 196, 16, 151, 5, 146, 83, 118, 61, 0, 0] })))))”]