TiCDC Not Progressing in Version 5.4.1: Issues Persist After Upgrading from 5.2 to 5.4.1

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc 不推进 5.4.1版本,我们是从5,2 升级到5.4.1的版本依旧有问题

| username: Hacker_ZcrkjsVg

There are warnings in the logs

| username: kkpeter | Original post link

Is there an unresolved major issue? I saw a post earlier saying that TiCDC got stuck due to batch deletions from the source.

| username: Hacker_ubN7WXjw | Original post link

We also encountered this problem, and we don’t dare to use TiCDC anymore.

| username: Hacker_ZcrkjsVg | Original post link

There is no stable pumper and drainer, even the officially recommended ones. I have already upgraded to 5.4.1.

| username: neilshen | Original post link

Please provide the TiCDC, TiKV, and TiDB monitoring data for 30 minutes before and after the issue occurred.

| username: Hacker_ZcrkjsVg | Original post link

There seems to be a peak around 3 AM.

| username: Hacker_ZcrkjsVg | Original post link

There are also performance peaks, which should be caused by some background tasks or nighttime tasks.

| username: HACK | Original post link

I had the same issue with version 4.0.6 before.

| username: Hacker_ZcrkjsVg | Original post link

We have currently upgraded to 5.4.1 and still have this issue.

| username: qizheng | Original post link

You can export the TiDB, TiKV-Details, and TiCDC monitoring snapshots for this period and take a look:

| username: dba-kit | Original post link

It looks like it automatically recovered around 11 o’clock?

| username: 代码工地头号民工 | Original post link

Yes, after a few hours, the CDC checkpoint became normal and the task was no longer delayed.

| username: qhd2004 | Original post link

The export is the data from 2022-07-08 04-05. Downloads.zip (2.0 MB)

| username: cs58_dba | Original post link

It is estimated that the only way is to optimize the SQL as much as possible and break down large transactions into smaller ones.

| username: Hacker_ZcrkjsVg | Original post link

Monitoring data has been submitted. Please help analyze it.

| username: neilshen | Original post link

Based on these two screenshots, it is speculated that one of the following situations may have caused the issue:

  • There is a long-running transaction in the upstream TiDB, which prevents TiCDC from advancing the checkpoint.
  • A large transaction was executed in the upstream TiDB, and TiCDC’s efficiency in processing large transactions is relatively low. During the processing period, TiCDC cannot advance the checkpoint.
  • An unexpected crash occurred in the upstream TiDB or TiKV, resulting in residual transaction locks, which also prevent TiCDC from advancing the checkpoint.

You need to analyze the specific upstream business context. If it is one of the first two cases, you need to adjust the business write mode and try to use small transactions for writing. If it is the last case, more monitoring and logs are needed for further analysis.

| username: Hacker_ZcrkjsVg | Original post link

There are indeed large transactions at night, but this is a business requirement for nighttime operations. TiCDC mentioned that this issue was resolved in a certain version, but now it seems that upgrading to the latest version 5 still doesn’t work. Drainer and Pumper currently do not have this issue.

| username: kkpeter | Original post link

CDC is indeed difficult to use.

| username: asddongmen | Original post link

The CDC in version 6.2 provides a large transaction splitting feature. If atomicity of transactions is not required, you can consider using CDC in version 6.2 and enable the transaction splitting feature, which can effectively solve the above issues.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.