[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.3
[Reproduction Path] What operations were performed when the issue occurred
No operations were performed recently. Yesterday, the checkpoint lag suddenly increased but then resolved itself. At that time, I checked the changefeed status, and it was normal with no errors reported. I also filtered the logs and found no errors. I would like to ask what situations could cause this. I don’t quite understand many of the CDC monitoring indicators; could an expert help take a look?

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

It should be that there was an issue with the original connection or data retrieval, but it has now been restored.

It is possible that a large transaction is blocking the process. You can check the SQL at the start of the delay to see if there is a large transaction.

It’s possible. I am using TiCDC to write data into Kafka. According to Kafka monitoring, the data volume indeed increased significantly yesterday. One reason could be large transactions. Is it possible that the data volume was too large, reaching the maximum processing capacity of TiCDC, causing the delay?

It is also possible. This depends on the resource usage of CDC. Generally, if the CPU, memory, and disk I/O are not high, there won’t be a situation where the delay is too high.

I checked the monitoring, and during the problematic period, the load, IO, and memory were all normal. It should be an issue with large transactions. Thanks~

