TiCDC Data Synchronization with Kafka: Table-level DDLs Will Not Use Database Name + Table Name Hash for Partitioning, They Will Only Be Distributed to Partition 0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiCDC数据同步Kafka,表级的DDL不会经过库名+表名Hash算Partition,只会分发到Partition0分区

| username: ShengFeng

[TiDB Usage Environment] Gray Test Environment
[TiDB Version] v5.4.0
[Reproduction Path] Create 32 partitions for Kafka topic, specify partition distributor as partition = “table”, execute DDL operations on multiple tables, such as alter table add column. DDL CDC messages will only fall into partition0, while DML CDC messages can be correctly hashed to the corresponding partition based on database name + table name.
[Encountered Problem: Phenomenon and Impact] When the partition distributor is set to partition = “table”, performing DDL and DML operations on the same table causes issues because the table structure changes. If DDL CDC messages and DML CDC messages are sent to different partitions, message consumption will be out of order. When the table structure changes, it causes DML message consumption errors.
[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

The described scenario and problem are inconsistent with the previous documentation, so there might not be a good solution, :rofl:

I suggest using version 6.1.X to test this.

Refer to the following:


Regardless of constraints or partitioning strategy, based on the information you provided… none of them meet the broadcasting requirements.

| username: ealam_小羽 | Original post link

Temporary solution:
Put the failed consumption into a separate queue and keep retrying.
Normally, DML operations can be performed normally after DDL execution is completed.
Details to consider:
Consider the retry time based on the DDL execution time and business scenario requirements to avoid ineffective retries.
Avoid infinite loops; if retries fail multiple times, it may be necessary to alert and manually confirm the issue for resolution.

| username: dba-kit | Original post link

We also have a similar scenario. I suggest directly modifying the Golang example provided by CDC. It has already handled the situation where DDL is only distributed to partition 0, and it is very mature. You can refer to the code example below:

| username: ShengFeng | Original post link

The issue has been identified, and it is related to the protocol specified by the Sink for MQ. If using the canal-json protocol, DDL will only be sent to Partition 0, whereas if using the open-protocol protocol, DDL will be broadcast to every Partition. It is not clear why it is designed this way, but this approach is not very friendly for ensuring sequential consumption downstream.

| username: ShengFeng | Original post link

This is the open-protocol protocol, where DDL Events can be broadcast to all partitions. However, with the canal-json protocol, they will only be sent to partition 0. I still don’t understand why the message protocols differ in this way.

| username: ShengFeng | Original post link

Considered this solution, it may cause data issues. Not all DMLs will report errors and enter the retry queue.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.