Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: ticdc同步任务无报错,同步下游tso不推进
[TiDB Usage Environment] Production Environment
[TiDB Version] ticdc
[Reproduction Path] Operations performed that led to the issue
ticdc checkpointTs is not advancing. Attempted pause and resume but it did not recover. Attempted to restart the cdc component using tiup but it did not recover. Cleared the sync task and recreated the sync task but it still did not recover. There are no task errors in the sync task.
[Problem Encountered: Symptoms and Impact]
![wecom-temp-438014f2f992dd70babfb317ba9711c3|690x149]
(upload://yXYwaNrxl4kpdqhxjYXtFoEaFc6.png)
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Check the CDC logs to see what the specific error is.
The image is not visible. Please provide the text you need translated.
Check if there are any regions in the cluster that are in an orphaned state… The cluster’s status needs to be restored before it can be used.
The log describes the region’s identifier, and you can use the identifier to check.
It seems there is a leader.
The image is not visible. Please provide the text you need translated.
There are two regions that were not found.
That means the state is inconsistent, which is the reason why CDC is not working.
First, fix the cluster’s state. Check issues related to regions such as empty, miss, peer, etc. A thorough investigation is needed.
How can I fix this situation? Is there a reference document link?
Refer to this:
I don’t know what operations the cluster has undergone to result in this situation.
You can only try to make up for it as much as possible. If you find that it’s not a replica loss but a replica failure, you can manually delete it. (It is recommended to back up before performing this operation)
The cluster is currently in an available state, but there are some concurrent writes in the business, and there are quite a few transaction conflicts. I’m not sure if this has an impact.
After re-backing up and importing, CDC synchronized some data. Currently, it is not advancing TSO, but the error reported is different.
ticdc and tidb are compatible with major versions, right? It’s best to use the same version…
The error described in the logs is basically related to PD, unable to save checkpoint…
The same, both are 4.0.16
Hello, according to the logs, the upstream actively closed the connection. Please check the upstream TiKV logs.
[2022/11/16 10:58:09.759 +08:00] [Error] [router.rs:174] [“failed to send significant msg”] [msg=LeaderCallback(Callback::Read(…))]
[2022/11/16 10:59:46.419 +08:00] [Error] [router.rs:174] [“failed to send significant msg”] [msg=“CaptureChange { cmd: RegisterObserver { observe_id: ObserveID(2564262), region_id: 1255104, enabled: true }, region_epoch: conf_ver: 27247 version: 6891, callback: Callback::Read(…) }”]
[2022/11/16 11:00:50.238 +08:00] [Error] [router.rs:174] [“failed to send significant msg”] [msg=“CaptureChange { cmd: RegisterObserver { observe_id: ObserveID(2564263), region_id: 1247613, enabled: true }, region_epoch: conf_ver: 422 version: 7259, callback: Callback::Read(…) }”]
[2022/11/16 11:06:15.951 +08:00] [Error] [endpoint.rs:1113] [“cdc send scan event failed”] [req_id=7629]
[2022/11/16 11:06:16.083 +08:00] [Error] [endpoint.rs:1113] [“cdc send scan event failed”] [req_id=7631]
[2022/11/16 11:06:16.111 +08:00] [Error] [endpoint.rs:1113] [“cdc send scan event failed”] [req_id=7630]
[2022/11/16 11:13:01.586 +08:00] [Error] [endpoint.rs:1113] [“cdc send scan event failed”] [req_id=7944]
[2022/11/16 11:17:40.535 +08:00] [Error] [endpoint.rs:1113] [“cdc send scan event failed”] [req_id=12664]
[2022/11/16 11:17:40.649 +08:00] [Error] [endpoint.rs:1113] [“cdc send scan event failed”] [req_id=12655]
At present, there are still region_not_found issues. Not sure if this is the cause.
Hello, please provide the complete TiKV logs corresponding to the time when the ticdc error occurred.