TiCDC synchronization progress stalls, sorter files increase dramatically

【TiDB Usage Environment】Production Environment
【TiDB Version】v5.3.0
【Encountered Problem】
When using CDC for incremental synchronization, the checkpoint occasionally stalls and sometimes suddenly catches up. This issue occurred again yesterday, with the checkpoint remaining unchanged and the checkpoint lag reaching 13 hours. Additionally, the sorter of two CDCs increased significantly.


Check the cdc.log log, use cdc cli changefeed query -c xxx to query the status of the changefeed.

Yesterday’s query status was normal, and cdc.log kept showing “Unified Sorter: trying to create file backEnd.” Now the changefeed has failed, reporting this:
“message”: “[CDC:ErrGCTTLExceeded]the checkpoint-ts(436308863790350610) lag of the changefeed(simple-replication-task) has exceeded the GC TTL”

Because there was a period when a table had 8 million rows of data deleted, it caused one CDC node to have a high load, and the speed of flushing to the downstream couldn’t keep up. Then the checkpointTs of that node didn’t change and there were no errors reported. Is there any way to optimize this? Additionally, the network bandwidth from upstream to downstream is only 100M.

The v5.3.0 version of CDC does not support large transactions. It is recommended that when using CDC for synchronization, the transaction size should not exceed 100MB; otherwise, it will severely slow down the synchronization process and may eventually lead to synchronization failure. If possible, consider using the v6.1.1 version of CDC and enable the large transaction splitting feature, which can effectively solve the above problem.

