TiCDC Stuck or Incremental Sync Slow After Creating Incremental Sync Task

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiCDC创建增量同步任务后卡住/增量同步缓慢

| username: 老司机1024

【TiDB Usage Environment】Production Environment
【TiDB Version】Main Cluster v6.1.1, Secondary Cluster v7.1.2
【Reproduction Path】
【Encountered Issues: Phenomenon and Impact】

  1. Exported/imported the v6.1.1 cluster to the v7.1.1 cluster using the br command, and created a ticdc task. During the creation of the new task, tables without primary keys were ignored. The task status was normal, and there were no error messages.
  2. Observed that the business table’s incremental synchronization was quite slow, stuck at a certain point for half an hour or several tens of minutes.
  3. Used br v6.1.1 to export full data and br v7.1.2 to import full data into the secondary cluster. Created an incremental task using ticdc/cdc v6.1.1. The business table at the current synchronization time was not ignored when creating the ticdc task.
  4. Disabled GC before starting the full export with br, and did not enable GC after creating the incremental task. The cdc timestamp used the backupts echoed after the br backup was completed.
    【Resource Configuration】
    【Attachments: Screenshots/Logs/Monitoring】
    ticdc log screenshot:

    ticdc task status screenshot:

    ticdc task details screenshot:

| username: 老司机1024 | Original post link

Is the lag caused by using BR for import/export, and the tables synchronized by TiCDC are not supported/compatible with BR? Should I use the MySQL command line to re-import/export?

| username: 老司机1024 | Original post link

TiKV log screenshot:

| username: Billmay表妹 | Original post link

What are your upstream and downstream distributions?

MySQL? TiDB?

| username: Fly-bird | Original post link

The data restored by BR definitely cannot be synchronized through CDC, but after restoring the data with BR, the newly added data supports CDC. It won’t cause CDC to lag just because the data came from BR. You can check the resource utilization of your upstream and downstream to see if there are any anomalies.

| username: 老司机1024 | Original post link

The upstream is TiDB v6.1.1 and the downstream is TiDB v7.1.2.

| username: 老司机1024 | Original post link

I just checked the upstream and downstream clusters and saw that TiCDC is fully loaded. I have already increased the configuration. I will observe it later. :+1:

| username: TiDBer_小阿飞 | Original post link

When using TiCDC to synchronize data between two TiDB clusters, if the latency between upstream and downstream exceeds 100 ms:

  • For versions prior to v6.5.2, it is recommended to deploy TiCDC in the region (IDC, region) where the downstream TiDB cluster is located.
  • After optimization, for versions v6.5.2 and later, it is recommended to deploy TiCDC in the region (IDC, region) where the upstream cluster is located.
| username: 老司机1024 | Original post link

Thank you for the reply. Our current scenario is to upgrade the main cluster from 6.1.1 to 7.1.2. To ensure uninterrupted business operations, we first use BR+TiCDC to synchronize the new version cluster, and then perform the cutover at a specific time.

| username: 老司机1024 | Original post link

Both clusters are in the same intranet, without crossing AZ and region.

| username: 老司机1024 | Original post link

Yesterday, I added a TiCDC node configuration, currently 16C32G, with an SSD disk. I can see that TiCDC is continuously writing data to the disk, but the downstream cluster is still syncing slowly. Could anyone provide other troubleshooting ideas? Thanks.

| username: 老司机1024 | Original post link

Upstream cluster load:


Downstream cluster load:

| username: 老司机1024 | Original post link

CDC task status:

| username: andone | Original post link

The official documentation is the most comprehensive.
https://docs.pingcap.com/zh/