Issue of Kafka Message Sending Failure When Upgrading TiCDC from v5.3.0 to v6.1.1

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc 从v5.3.0升级到v6.1.1出现kafka消息发送失败问题

| username: porpoiselxj

[TiDB Usage Environment] Test
[TiDB Version] v6.1.1
[Encountered Problem]
After upgrading from V5.3.0 to v6.1.1, cdc to kafka encountered blockage. The error is shown in the attachment (the maximum message size set for kafka is 250M).

[Reproduction Path]
No modifications were made after the upgrade (protocol was changed from default to open-protocol, otherwise cdc would report an error indicating that the protocol was not set).

[Problem Phenomenon and Impact]
After cdc gets stuck, it continues to send messages to kafka. The phenomenon is that incremental logs from a past period are repeatedly sent, causing kafka traffic to be overwhelmed, and cdc progress stalls. It seems that there is an issue with a particular table, and the tables parsed before this problematic table enter a dead loop, repeatedly retrying the process.

Looking at the error stack, there are no significant changes in the source code between the new version’s dependency GitHub - IBM/sarama: Sarama is a Go library for Apache Kafka. v1.29.2 and the old version v5.3.0’s dependency GitHub - IBM/sarama: Sarama is a Go library for Apache Kafka. v1.27.2. Details are as follows:

This parameter is hardcoded.

[Attachment]

| username: TiDBer_CEVsub | Original post link

It’s better not to upgrade to a higher version.

| username: zhouzeru | Original post link

You can try clearing Kafka’s snapshots in Zookeeper and reconnecting to see if it works.