TiCDC Synchronization Kafka Checkpoint Latency

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc 同步kafka checkpoint 延迟

| username: 爱白话的晓辉

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4.2
[Reproduction Path] The checkpoint timestamp in CDC suddenly stops updating
[Encountered Problem: Symptoms and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]



How can I skip this error or modify the configuration?

| username: 像风一样的男子 | Original post link

Modify the downstream Kafka configuration:

Maximum bytes a broker can receive for a message

message.max.bytes=2147483648

Maximum bytes a broker can replicate for a message

replica.fetch.max.bytes=2147483648

Maximum bytes a consumer can read for a message

fetch.message.max.bytes=2147483648

The official documentation provides a solution:

| username: 爱白话的晓辉 | Original post link

The Kafka we are using is from Alibaba Cloud, and it has already been adjusted to the maximum of 10M. What are the solutions for modifying TiCDC at the moment?

| username: 像风一样的男子 | Original post link

Reduce the max-message-bytes configuration in the Kafka Sink URI of cdc.

| username: Anna | Original post link

Based on your description, the issue might be caused by the TiCDC checkpoint TSO not changing. TiCDC periodically records the checkpoint TSO for recovering synchronization tasks. If the checkpoint TSO does not change, it may cause synchronization tasks to fail.

To resolve this issue, check whether the TiCDC synchronization tasks and monitoring metrics are normal. You can follow these steps:

  1. Check if the TiCDC synchronization tasks are normal. You can use the TiCDC command-line tool tiup ctl or the PD monitoring panel to view the status of TiCDC synchronization tasks. If the synchronization task status is normal, you can rule out issues caused by abnormal synchronization tasks.
  2. Check if the TiCDC monitoring metrics are normal. You can use the TiCDC monitoring panel or Grafana to monitor TiCDC metrics, such as checkpoint TSO and resolved TS, to see if there are any abnormal fluctuations or persistent unchanging conditions.
  3. If there are anomalies in the TiCDC monitoring metrics, try restarting the TiCDC nodes or using TiCDC’s dynamic parameter adjustment feature to gradually adjust parameters and observe if the monitoring metrics return to normal.
  4. If the TiCDC monitoring metrics are normal but there are still alerts, try adjusting the alert rules, such as changing the alert thresholds or alert intervals, to reduce false positives.

Note that TiCDC synchronization tasks are complex and require analysis and resolution based on specific circumstances. It is recommended to back up your data before resolving the issue to prevent data loss. Additionally, when deploying TiCDC, set the parameters reasonably according to actual business conditions and hardware configurations to avoid issues such as abnormal synchronization tasks or monitoring metrics.

| username: Anna | Original post link

The image is not visible. Please provide the text you need translated.

| username: 爱白话的晓辉 | Original post link

Is there a parameter to skip erroneous data?

| username: 爱白话的晓辉 | Original post link

The parameter max-message-bytes=1040000 has already been reduced.

| username: yilong | Original post link

You can try using ignore-txn-start-ts to skip.