Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: Canal-JSON协议DDL只发送索引为0的partiton,导致无法保证DML>DDL>DML顺序消费问题
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v5.4.0
[Reproduction Path] What operations were performed to encounter the issue
[Encountered Issue: Issue Phenomenon and Impact]
Why is it designed that the Open Protocol broadcasts DDL to each partition, while the Canal-JSON protocol sends DDL to partition with index 0?
As shown in the figure below:
I understand that as shown in the figure above, TiCDC ensures the order of DML > DDL > DML when outputting event events downstream. TiCDC ensures that the input order is correct, but when the Kafka Topic of the Sink has multiple partitions, the Canal-JSON protocol sends DDL to the partition with index 0. During consumption, it is impossible to ensure that all DMLs before the DDL are consumed before consuming the DDL, resulting in an inability to guarantee order on the consumer side.
According to the description in the TiCDC Canal-JSON Protocol documentation, the Canal-JSON protocol is a data exchange format protocol defined by Alibaba’s Canal. In the Canal-JSON protocol, DDL Events are only sent to the partition with index 0 because the Canal-JSON protocol was originally designed for MySQL, where DDL operations are executed at the instance level, so it is sufficient to send the DDL Event to just one partition. However, in TiDB, DDL operations are executed at the table level, so TiCDC broadcasts DDL Events to each partition through the Open Protocol to ensure that DML > DDL > DML events are output to the downstream in order. Due to the limitations of the Canal-JSON protocol, when the Sink’s Kafka Topic has multiple partitions, it is not possible to ensure that all DMLs before the DDL are consumed before the DDL is consumed, which may lead to order issues on the consumer side. To solve this problem, it is recommended to use the Open Protocol to consume the event events output by TiCDC to ensure ordered consumption.
Hello, I have a question. When the dispatcher is configured as dispatcher=“table”, how is the partition calculated based on the hash of the Scheme name and table name?
When the dispatcher is configured as dispatcher=“table”, the algorithm for hash calculation based on the Scheme name and table name
My cousin has already replied to this question.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.