Significant Delay in CDC Changefeed

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: cdc changefeed出现较大延迟

| username: Hacker_lB0K9iQY

【TiDB Usage Environment】Production Environment
【TiDB Version】v5.2.4
【Encountered Problem: Phenomenon and Impact】CDC synchronization to MySQL has significant delays, while the other two changefeeds synchronizing to the same MySQL instance do not have significant delays.
【Resource Configuration】
3 TiDB/PD nodes with 16 cores * 64GB each, with one CDC node deployed on each machine
3 TiKV nodes with 16 cores * 64GB each

【Attachments: Screenshots/Logs/Monitoring】


eeo-tidb002-TiCDC_2022-11-23T02_34_03.571Z.json (208.3 KB)

[2022/11/22 23:30:46.205 +08:00] [INFO] [region_worker.go:243] [“single region event feed disconnected”] [changefeed=eoosfile-dqs] [regionID=2562584] [requestID=855] [span=“[7480000000000001ff475f72800000000cff9e27da0000000000fa, 7480000000000001ff475f72800000000cffa821480000000000fa)”] [checkpoint=437552688802299918] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:2562584 leader:<id:2562586 store_id:1 > > : not_leader:<region_id:2562584 leader:<id:2562586 store_id:1 > > “]
[2022/11/22 23:30:46.205 +08:00] [INFO] [region_range_lock.go:383] [“unlocked range”] [changefeed=eoosfile-dqs] [lockID=3] [regionID=2562584] [startKey=7480000000000001ff475f72800000000cff9e27da0000000000fa] [endKey=7480000000000001ff475f72800000000cffa821480000000000fa] [checkpointTs=437552688802299918]
[2022/11/22 23:30:46.205 +08:00] [INFO] [region_cache.go:1102] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=2562584] [currIdx=2] [leaderStoreID=1]
[2022/11/22 23:30:46.205 +08:00] [INFO] [region_range_lock.go:222] [“range locked”] [changefeed=eoosfile-dqs] [lockID=3] [regionID=2562584] [startKey=7480000000000001ff475f72800000000cff9e27da0000000000fa] [endKey=7480000000000001ff475f72800000000cffa821480000000000fa] [checkpointTs=437552688802299918]
[2022/11/22 23:30:46.205 +08:00] [INFO] [client.go:926] [“cannot get rpcCtx, retry span”] [changefeed=eoosfile-dqs] [regionID=2562584] [span=”[7480000000000001ff475f72800000000cff9e27da0000000000fa, 7480000000000001ff475f72800000000cffa821480000000000fa)”]
[2022/11/22 23:30:46.205 +08:00] [INFO] [region_range_lock.go:383] [“unlocked range”] [changefeed=eoosfile-dqs] [lockID=3] [regionID=2562584] [startKey=7480000000000001ff475f72800000000cff9e27da0000000000fa] [endKey=7480000000000001ff475f72800000000cffa821480000000000fa] [checkpointTs=437552688802299918]
[2022/11/22 23:30:46.205 +08:00] [INFO] [region_range_lock.go:222] [“range locked”] [changefeed=eoosfile-dqs] [lockID=3] [regionID=2562584] [startKey=7480000000000001ff475f72800000000cff9e27da0000000000fa] [endKey=7480000000000001ff475f72800000000cffa821480000000000fa] [checkpointTs=437552688802299918]
[2022/11/22 23:30:46.206 +08:00] [INFO] [client.go:825] [“start new request”] [changefeed=eoosfile-dqs] [request=“{"header":{"cluster_id":7039262237001066123,"ticdc_version":"5.2.4"},"region_id":2562584,"region_epoch":{"conf_ver":2609,"version":4353},"checkpoint_ts":437552688802299918,"start_key":"dIAAAAAAAAH/R19ygAAAAAz/nifaAAAAAAD6","end_key":"dIAAAAAAAAH/R19ygAAAAAz/qCFIAAAAAAD6","request_id":2075,"extra_op":1,"Request":null}”] [addr=10.1.38.111:20161]
[2022/11/22 23:30:46.207 +08:00] [INFO] [region_worker.go:243] [“single region event feed disconnected”] [changefeed=eoosfile-dqs] [regionID=2562584] [requestID=2075] [span=“[7480000000000001ff475f72800000000cff9e27da0000000000fa, 7480000000000001ff475f72800000000cffa821480000000000fa)”] [checkpoint=437552688802299918] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:2562584 leader:<id:2562586 store_id:1 > > : not_leader:<region_id:2562584 leader:<id:2562586 store_id:1 > > "]

| username: dba-kit | Original post link

If the same TiDB is synchronized to the same MySQL through three changefeeds, and the other two have no delay, it is likely not an issue with TiDB or MySQL. You might want to check if there are large transactions on the table of the changefeed experiencing the delay. Looking at the curve you posted, the sharp rise and sudden drop are very indicative of large transactions.