Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: cdc 发送到Kafka 的消息协议
Excuse me, teachers,
The message protocols sent by CDC to Kafka have optional values of default
, canal
, avro
, and maxwell
(the default value is default
). What do these message protocols represent? Similarly, there are two tools, canal and maxwell, that can read MySQL binlog. Are these two related in any way?
Additionally, what does the default format of the message protocol specifically represent?
The default format refers to protobuf. If you are using canal, you can choose between canal and canal-json formats for parsing and receiving. If you are using maxwell, you can only choose maxwell. The above protocols and formats are different and cannot be mixed.
So, if CDC sends data to Kafka first, and then Canal or Maxwell retrieves data from Kafka, should the format be chosen as Canal or Maxwell?
What components primarily use the protobuf format?
Yes, after configuring the protocol through CDC, data will be transmitted to Kafka in the given format. When receiving data from Kafka, you must use the same format; otherwise, how would you parse it?
Protobuf is a language-neutral implementation and the core definition of gRPC, invented by Google.
So, may I ask, after CDC pushes data to Kafka, and then Flink consumes data from Kafka, why can the format be set to canal-json at this time?
Besides supporting canal-json, does Flink also support Maxwell?
Already answered… Please digest it yourself~
Also, refer more to the official documentation.
The message format is theoretically just a data format, such as JSON or XML. As long as the data receiver knows how to parse it, it can be used. As for the message formats in TiCDC, which include Canal and Maxwell, these are also common CDC message formats that are supported. This way, if you have used these two common formats in your business, you can directly use them without needing to learn another protocol format. For specific message formats, refer to: TiCDC Avro Protocol | PingCAP 文档中心
Did you get it wrong? Canal, as a MySQL binlog incremental acquisition and parsing tool, can deliver change records to MQ systems, such as Kafka/RocketMQ. Its upstream should be MySQL, serving as a binlog parsing tool.
The so-called canal-json protocol is just CDC applying the canal parsing protocol. Essentially, CDC parses TiDB’s operation logs and then transmits the content in JSON format to Kafka using this protocol.
As for how you consume from Kafka, that depends on you.
Thank you for the guidance, teacher.
The first time someone called me a teacher, I’m also a TiDB newbie, let’s learn together~