CDC writing to Kafka error: write: connection reset by peer

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: cdc 写入kafka 报错write: connection reset by peer

| username: 是我的海

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.3, downstream Kafka version v2.4.2
[Reproduction Path] Error occurs after running for a few days using index-value partition mode
[Encountered Problem: Phenomenon and Impact]
Data from the TiDB cluster is written to Kafka via CDC, configured as follows:

case-sensitive = true
enable-old-value = true

[filter]
ignore-txn-start-ts = [1, 2]
rules = ['study_info.*','rtquery.*']
[mounter]
worker-num = 8
[sink]
dispatchers = [
    {matcher = ['study_info.*'], partition = "index-value"},
]
protocol = "canal-json"

After running for a while, synchronization is interrupted with the following error:

"[CDC:ErrKafkaAsyncSendMessage]kafka async send message failed: kafka: Failed to produce message to topic tidb_rtquery_new: write tcp xxx.129.15:38098->xxxx.150.36:9095: write: connection reset by peer\"}

xxx.129.15 is our CDC node, xxx.150.36 is the Tencent Cloud Kafka address.

Reproduction and Testing of the Above Issue:

  1. If the partition is changed to table, synchronization works fine without errors.
  2. If the partition is changed back to index-value, the error reoccurs.

Question:
When creating a CDC task to downstream Kafka, the address used is port 9092. Why does the exception occur in index-value mode?
The above tests have been conducted multiple times. Changing to table mode results in normal synchronization, while changing to index-value mode results in errors.
–sink-uri=“kafka://xxx.150.36:9092/tidb_rtquery_new?protocol=canal-json&partition-num=6&max-message-bytes=10485760&replication-factor=1”

Below are the errors encountered during testing. Initially thought it was a TiDB version issue, but after upgrading from 6.5.1 to 6.5.3 this morning, the same issue persists.


Supplement on 2024-02-01:
Found the following error logs on the Tencent Cloud server


Tencent Cloud R&D suggested configuring the send.buffer.bytes parameter on the CDC side to be smaller, less than 16kb. However, this parameter does not seem to be exposed in the documentation. Are there any other ways to bypass this?

| username: Fly-bird | Original post link

What is the Kafka version?

| username: ajin0514 | Original post link

What is the version?

| username: 是我的海 | Original post link

The Kafka version is 2.4.2, as noted in the title.

| username: army | Original post link

Try increasing the socket.request.max.bytes configuration on the Kafka side.

| username: 是我的海 | Original post link

Kafka is Tencent Cloud’s, they don’t allow configuration changes :slightly_frowning_face:

| username: Fly-bird | Original post link

The version is incorrect, often encountered.

| username: 是我的海 | Original post link

I have confirmed the version issue with them: v2.4.1. The statement I created is as follows:
–sink-uri="kafka://xxxxxx:9092/tidb_xxxx?protocol=canal-json&kafka-version=2.4.1&


How did you solve this problem?

| username: 像风一样的男子 | Original post link

Is “replication” misspelled?

| username: 双开门变频冰箱 | Original post link

What version is it?

| username: 数据库真NB | Original post link

The version is incorrect, right?