Archiving Generates Delete Events Causing CDC Link Delays

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 归档产生delete事件导致cdc链路延迟

| username: bbhy135258

On January 18, 2024, at 20:10, several tables in TiDB were archived. During the archiving process, CDC experienced delays. The CDC link delay was resolved only after the archiving stopped and the delayed consumption was completed.

[TiDB Version] V6.1.5
[Impact of the Bug]
Every day at midnight, the previous day’s data synchronized to Kafka via CDC is processed. If there is a delay at midnight, it will affect the report data for that day. The CDC delay causes incomplete consumption of the day’s data after midnight.

[Expected Behavior]
During archiving, CDC performance should be able to handle the volume of data generated by deletions.

[Additional Background Information or Screenshots]
Archive log screenshot:

CDC delay Grafana screenshot 1:

CDC delay Grafana screenshot 2:

CDC delay Grafana screenshot 3:

CDC delay Grafana screenshot 4:

| username: zhanggame1 | Original post link

Is archiving the process of deleting part of the original table’s data and inserting it into a historical table, all within the same database?

| username: 路在何chu | Original post link

You can directly create an archive database, and the archived data will not be synchronized, so there will be no delay if you set up a CDC filtering mechanism.

| username: Billmay表妹 | Original post link

In version 6.2, there is an event filter… specifying that the delete operation of a certain table should not be synchronized downstream. See if this can meet your needs. You should upgrade to version 6.5.x to experience this feature.

| username: flow-PingCAP | Original post link

The bottleneck might be that the sink is writing downstream too slowly. If the downstream is MySQL or TiDB, try increasing the worker-count.配置-mysqltidb

| username: yiduoyunQ | Original post link

  1. Is ticdc deployed upstream or downstream, and what is the network latency between upstream and downstream? See TiCDC 常见问题解答 | PingCAP 文档中心
  2. Check upstream large transactions & downstream write bottlenecks, see TiCDC 常见问题解答 | PingCAP 文档中心
| username: Jellybean | Original post link

Your CDC performance bottleneck is at the sorter stage, the puller stage is normal, but the sorter is relatively slow, affecting the subsequent synchronization QPS.

You can perform targeted tuning.

| username: Jellybean | Original post link

You can refer to my previous experience in optimizing CDC. If your TiCDC synchronization performance has not been improving and you can’t find a good solution, try increasing the per-table-memory-quota parameter. It might bring unexpected surprises.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.