TiCDC Data Synchronization is Very Slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc同步数据很慢

| username: TiDBer_vC2lhR9G

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.0
[Encountered Problem: Phenomenon and Impact]
Our company needs to migrate from Alibaba Cloud to another cloud service provider, so the database needs to be migrated.
I first used BR for a full migration, and then used TiDB CDC for incremental data. During the incremental process, the speed was extremely slow, with only a few minutes of data migrated in several hours. The checkpoint remained at yesterday’s time point, but the task was normal, as shown in the figure:


The target TiDB cluster also had only a few hundred QPS for data entry, which was very slow. Please help to check.

[Resource Configuration]
Source Configuration:


cdc: 16C32G 500G SSD * 1
tidb: 16C32G 500G SSD * 3
kv: 24C48G 1.3T SSD * 3

Target Configuration:


tidb: 32C64G 500G SSD * 2
tikv: 32C 64G 1.5T SSD * 3

The target machine has better performance and is a newly built cluster. There should be no write performance issues.

Public network migration!!! There’s no other way, it’s cross-cloud service providers. Could this be the reason? I saw in the documentation that this factor was mentioned.

| username: xfworld | Original post link

Has ticdc been integrated with Prometheus? By observing some parameters of ticdc, it is relatively easy to determine:

  1. Upstream issues?
  2. Downstream issues?
  3. Network bottlenecks?

If you have any suspicions, you can check the ticdc logs to see if there are any warnings or errors…

| username: TiDBer_vC2lhR9G | Original post link

There isn’t much obvious slow information in the ticdc logs. The downstream network is a public 200M broadband, and the load on the upstream machine seems to be fine as well.

| username: Minorli-PingCAP | Original post link

In addition to bandwidth, factors such as latency and packet loss rate will also affect public network migration. Especially if the network quality is poor and retransmissions occur, performance will be very poor or even unusable. You can use iperf to test the network quality on both sides.

| username: asddongmen | Original post link

Hello, has the issue been resolved? You can try the following:

  1. Check the network latency from the TiCDC machine to the target TiDB cluster machines.
  2. If there is no issue with network latency, could you please upload the TiCDC logs and the relevant metrics from the Sink and Dataflow sections in the TiCDC Dashboard on the Grafana monitoring panel to help us troubleshoot the problem?
  3. If there is an issue with network latency, you might consider deploying the TiCDC server in the same environment as the target TiDB cluster. This can help alleviate the synchronization delay caused by network latency to some extent.