TiDB5.4.X: CDC Task Error: TiCDC cannot deliver messages when the `replication-factor` is less than `min.insync.replicas`

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB5.4.X:CDC任务报错:TiCDC cannot deliver messages when the replication-factor is less than min.insync.replicas

| username: Cloud王

【TiDB Usage Environment】Testing
【TiDB Version】5.4.0 - 5.4.3
【Encountered Issue】TiCDC cannot deliver messages when the replication-factor is less than min.insync.replicas
【Issue Phenomenon and Impact】

  1. Issue Phenomenon
    After upgrading from 4.0.15 to 5.4.3, the previously normal CDC task encountered an error.
    CDC task error:
    “message”: “[CDC:ErrKafkaNewSaramaProducer]new sarama producer: [CDC:ErrKafkaInvalidConfig]because TiCDC Kafka producer’s request.required.acks defaults to -1, TiCDC cannot deliver messages when the replication-factor is less than min.insync.replicas: replication-factor cannot be smaller than the min.insync.replicas of topic”

CDC log error:
[ERROR] [changefeed.go:119] [“an error occurred in Owner”] [changefeed=testcdc0-testcdc-t5] [error=“[CDC:ErrKafkaNewSaramaProducer]new sarama producer: [CDC:ErrKafkaInvalidConfig]because TiCDC Kafka producer’s request.required.acks defaults to -1, TiCDC cannot deliver messages when the replication-factor is less than min.insync.replicas: replication-factor cannot be smaller than the min.insync.replicas of topic”]

  1. Detailed Testing
    From the description, when request.required.acks is set to -1, Kafka’s parameter replication-factor cannot be less than min.insync.replicas. However, actual testing revealed the following situations:

When min.insync.replicas is 1, setting replication-factor to 1, 2, or 3 results in normal synchronization.
When min.insync.replicas is 2, setting replication-factor to 1, 2, or 3 results in errors.
When min.insync.replicas is 3, setting replication-factor to 1, 2, or 3 results in errors.

The issue here is that when min.insync.replicas is 2 and replication-factor is set to 3, both new and old CDC tasks report errors. In production environments, Kafka is often configured this way, so such errors can significantly impact CDC tasks.

Additionally, it is unclear why when min.insync.replicas is 2 and replication-factor is set to 2, CDC tasks report errors, but when min.insync.replicas is 1 and replication-factor is set to 1, CDC tasks run normally.

  1. Involved Versions
    Testing shows that whether upgrading to 5.4.X or directly installing the 5.4.X version, this CDC error occurs. However, this issue does not appear in versions 4.0.X, 5.0.X to 5.3.X.
| username: Meditator | Original post link

Speculation:
In CDC 4.x, the default value of the request.required.acks parameter for the CDC producer is 1.
In CDC 5.4.x, the default value of the request.required.acks parameter for the CDC producer is changed to -1, which means that the message is considered successfully sent only after all followers have acknowledged it.
Additionally, the min.insync.replicas parameter only takes effect when request.required.acks is set to -1.

| username: hi-rustin | Original post link

This doesn’t seem to meet expectations. What is the default replication-factor of your cluster? What does the sink-uri look like when you create the changefeed? TiCDC will only report an error when the replication-factor < min.insync.replicas. It won’t report an error if it is equal to or greater than. Is your topic new or old? If it’s old, what was the replication-factor when it was created?

| username: Cloud王 | Original post link

The topic was manually created in advance:
kafka-topics.sh --create --zookeeper XXX --replication-factor 3 --partitions 3 --topic test1

The sink-uri is like this:
/usr/bin/cdc cli changefeed create --pd=XXX --start-ts=XXX --sink-uri="kafka://XXX/test1?message.max.bytes=2147483648?partition-num=3

| username: hi-rustin | Original post link

Try adding replication-factor=3 in the sink-uri.

| username: Cloud王 | Original post link

Changed the sink-uri to this, but still getting an error:
/usr/bin/cdc cli changefeed create --pd=XXX --start-ts=XXX --sink-uri="kafka://XXX/test1?message.max.bytes=2147483648?partition-num=3?replication-factor=3

| username: hi-rustin | Original post link

This is very strange. Can you post the completed creation process and error logs after the changes? Also, please check the parameter information of your topic.

| username: Cloud王 | Original post link

  1. Create Kafka topic:
    /data/kafka/kafka_2.12-2.4.1/bin/kafka-topics.sh --create --zookeeper XXX:2181 --replication-factor 3 --partitions 3 --config min.insync.replicas=2 --topic test3

  2. View topic properties:
    /data/kafka/kafka_2.12-2.4.1/bin/kafka-topics.sh --describe --bootstrap-server XXX:9092 --topic test3

Topic: test3 PartitionCount: 3 ReplicationFactor: 3 Configs: min.insync.replicas=2,segment.bytes=1073741824
Topic: test3 Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
Topic: test3 Partition: 1 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1
Topic: test3 Partition: 2 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2

  1. Create CDC task:
    /usr/bin/cdc cli changefeed create --pd=XXX --start-ts=436935319064936449 --sink-uri=“kafka://XXX:9092/test3?message.max.bytes=2147483648?partition-num=3?replication-factor=3” --changefeed-id=“testcdc0” --config=/home/tidb/testcdc_yaml/testcdc0_testcdc_t0.yaml

  2. Configuration file:
    case-sensitive = true
    enable-old-value = true

[filter]
rules = [
“testcdc0.testcdc_t0”
]

[mounter]
worker-num = 8

[sink]
dispatchers = [
{matcher = [
“testcdc0.testcdc_t0”
], dispatcher = “rowid”},
]
protocol = “canal-json”

[cyclic-replication]
enable = false
replica-id = 1

  1. Task creation return:
    Create changefeed successfully!
    ID: testcdc0
    Info: {“sink-uri”:“kafka://XXX:9092/test3?message.max.bytes=2147483648?partition-num=3?replication-factor=3”,“opts”:{},“create-time”:“2022-10-26T17:19:45.759358278+08:00”,“start-ts”:436935319064936449,“target-ts”:0,“admin-job-type”:0,“sort-engine”:“unified”,“sort-dir”:“”,“config”:{“case-sensitive”:true,“enable-old-value”:true,“force-replicate”:false,“check-gc-safe-point”:true,“filter”:{“rules”:[“testcdc0.testcdc_t0”],“ignore-txn-start-ts”:null},“mounter”:{“worker-num”:8},“sink”:{“dispatchers”:[{“matcher”:[“testcdc0.testcdc_t0”],“dispatcher”:“rowid”}],“protocol”:“canal-json”},“cyclic-replication”:{“enable”:false,“replica-id”:1,“filter-replica-ids”:null,“id-buckets”:0,“sync-ddl”:false},“scheduler”:{“type”:“table-number”,“polling-time”:-1}},“state”:“normal”,“history”:null,“error”:null,“sync-point-enabled”:false,“sync-point-interval”:600000000000,“creator-version”:“v4.0.16”}

  2. Task error message:
    cdc cli changefeed query -s --pd=http://XXX:2379 --changefeed-id=testcdc0
    {
    “state”: “error”,
    “tso”: 436935319064936449,
    “checkpoint”: “2022-10-26 17:19:26.892”,
    “error”: {
    “addr”: “172.16.72.22:8300”,
    “code”: “CDC:ErrKafkaNewSaramaProducer”,
    “message”: “[CDC:ErrKafkaNewSaramaProducer]new sarama producer: [CDC:ErrKafkaInvalidConfig]because TiCDC Kafka producer’s request.required.acks defaults to -1, TiCDC cannot deliver messages when the replication-factor is less than min.insync.replicas: replication-factor cannot be smaller than the min.insync.replicas of topic”
    }
    }

  3. Log error:
    [2022/10/26 17:20:16.193 +08:00] [ERROR] [kafka.go:571] [“replication-factor cannot be smaller than the min.insync.replicas of topic”] [replicationFactor=1] [minInsyncReplicas=2]
    [2022/10/26 17:20:16.581 +08:00] [ERROR] [changefeed.go:119] [“an error occurred in Owner”] [changefeed=testcdc0] [error=“[CDC:ErrKafkaNewSaramaProducer]new sarama producer: [CDC:ErrKafkaInvalidConfig]because TiCDC Kafka producer’s request.required.acks defaults to -1, TiCDC cannot deliver messages when the replication-factor is less than min.insync.replicas: replication-factor cannot be smaller than the min.insync.replicas of topic”]

I found this prompt in the log error: [replicationFactor=1] [minInsyncReplicas=2], but the topic’s replicationFactor is clearly set to 3.
The error should be here; I don’t know why the created replicationFactor is 3, but CDC thinks replicationFactor=1.

This also explains why when min.insync.replicas is 2 and replication-factor is set to 2, the CDC task reports an error, but when min.insync.replicas is 1 and replication-factor is set to 1, the CDC task works fine.
It is probably because CDC always thinks the replicationFactor value is 1.

| username: hi-rustin | Original post link

Your sink-uri is incorrect. The format for passing parameters should be xxx?a1=1&a2=2&a3=3.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.