The Ticdc parameter partition-num in Tidb7.1.1 is invalid

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tidb7.1.1的Ticdc参数partition-num无效

| username: Cloud王

Bug Report
Clearly and accurately describe the issue you found. Providing any steps to reproduce the issue can help the development team address it promptly.
[TiDB Version]
Tidb7.1.1

[Impact of the Bug]
The partition-num parameter of Ticdc is ineffective, resulting in the inability to specify the number of partitions for synchronizing data to Kafka.

[Possible Steps to Reproduce the Issue]

  1. Create a Kafka topic with 3 partitions:

  2. Create a Ticdc changefeed:

curl -X POST http://XXX.XXX.XXX.XXX:8300/api/v2/changefeeds -d '{"changefeed_id":"test-patition","start_ts": 445150430914150401,"sink_uri":"kafka://XXX.XXX.XXX.XXX:9092/kafka-task1?enable-tidb-extension=true&**partition-num=1**&replication-factor=1&max-message-bytes=67108864","replica_config": {"filter": {"rules":["test_cdc.*"]},"ignore_ineligible_table": true,"mounter": {"worker_num": 16},"sink": {"dispatchers": [{"matcher": ["test_cdc.*"],"partition": "index-value"}],"protocol": "canal-json"}}}'
  1. Result
    In the case of 3 partitions, partition-num=1 and 2 are ineffective, resulting in all 3 Kafka partitions receiving synchronized data.

Note:
The replication-factor=1 or 3 does not affect the result.
Using API v2 or the command line cdc cli changefeed create yields the same result.

[Observed Unexpected Behavior]
All 3 Kafka partitions receive synchronized data.

[Expected Behavior]
Only 1 Kafka partition should receive synchronized data, and the other two partitions should not receive any data.

[Related Components and Specific Versions]
Tidb7.1.1, installed via tiup: 3 TiDB nodes, 3 TiKV nodes, 3 Ticdc nodes

[Other Background Information or Screenshots]

| username: 随缘天空 | Original post link

The number of partitions for the Kafka topic should be consistent with the value of the partition-num parameter. Try setting the number of Kafka partitions to 1 or 2. Otherwise, if the number of partitions for the Kafka topic is 3, TiCDC will synchronize data to all partitions regardless of the partition-num setting.

| username: heiwandou | Original post link

Try partition consistency.

| username: Cloud王 | Original post link

So what is the significance of this parameter? No matter what value it is set to, it will be synchronized to all partitions. Wouldn’t the effect be the same if this parameter were removed?

| username: Cloud王 | Original post link

Additionally, I tried Tidb4.0.15, and in the case where the Kafka topic has 3 partitions, when Ticdc is set with partition-num=1, it will only synchronize to 1 partition and will not synchronize to 3 partitions.

| username: 随缘天空 | Original post link

First, try setting the Kafka partitions to 2 and also set partition-num to 2. Check to see how many partitions it will sync to.

| username: Cloud王 | Original post link

Tidb7.1.1:

  1. partition-num=3, topic partition count is 3, sync partition count is 3

  2. partition-num=2, topic partition count is 3, sync partition count is 3

  3. partition-num=1, topic partition count is 3, sync partition count is 3

  4. partition-num=2, topic partition count is 2, sync partition count is 2

  5. partition-num=1, topic partition count is 2, sync partition count is 2

Tidb4.0.15:

  1. partition-num=3, topic partition count is 3, sync partition count is 3

  2. partition-num=2, topic partition count is 3, sync partition count is 2

  3. partition-num=1, topic partition count is 3, sync partition count is 1

  4. partition-num=2, topic partition count is 2, sync partition count is 2

  5. partition-num=1, topic partition count is 2, sync partition count is 1

| username: Fly-bird | Original post link

Kafka also needs to be adjusted, right?

| username: Cloud王 | Original post link

What needs to be adjusted in Kafka? Is it the number of partitions?

Assuming Kafka has 10 partitions, and I only want to synchronize one of them, then setting partition-num=1 should fulfill this requirement.
In fact, version 4.0.15 did achieve this.
However, in version 7.1.1, this parameter suddenly became ineffective.

| username: 随缘天空 | Original post link

From your screenshot, it can be seen that in higher versions, the number of synchronization partitions is mainly based on the number of topic partitions. If the partition-num is less than the number of topic partitions, then this parameter is invalid. In lower versions, it is based on partition-num. You can set the value of the partition-num parameter higher, exceeding the number of topic partitions, and then check the synchronization numbers between different versions.

| username: Cloud王 | Original post link

All errors occurred, indicating that the number of partition-num set is greater than the number of partitions in the Kafka topic. The error message is the same:
[CDC:ErrKafkaNewSaramaProducer][CDC:ErrKafkaInvalidPartitionNum] the number of partitions (5) specified in sink-uri is more than that of the actual topic (3)

| username: 像风一样的男子 | Original post link

If the test results are really as you described, this is definitely a bug. You can raise an issue.

| username: 随缘天空 | Original post link

From your error message, it looks like the partition-num exceeds the number of topics. It is estimated that this value cannot exceed the number of topics. There might be an issue with this in higher versions of the cluster. You can consult the official personnel for more information.

| username: Cloud王 | Original post link

Issue has been raised: Invalid Ticdc parameter partition-num for Tidb7.1.1 · Issue #9952 · pingcap/tiflow · GitHub
Thank you all!

| username: 有猫万事足 | Original post link

Under normal circumstances, you should be able to see a Warn-level log.

number of partition specified in sink-uri is less than that of the actual topic.
Some partitions will not have messages dispatched to

Is this content present? If not, check whether the ticdc version and tidb version are consistent?

| username: Cloud王 | Original post link

This warning appeared in the logs.

| username: fubinz | Original post link

Thank you for reporting this issue. It has been transferred to the tiflow repo for further follow-up.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

| username: Billmay表妹 | Original post link

This issue will be fixed in version 7.1.6.