Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: Tidb cdc blob字段值传输到kafka topic中的值存在差异
【TiDB Usage Environment】Testing environment
【TiDB Version】5.3.1.0
【Problem Encountered】There is a discrepancy in the blob field value transmitted to the Kafka topic by ticdc.
For example: Manually reading the blob field value and writing it into the Kafka topic results in the value being “3Amp(” when consuming the topic, but the data synchronized to Kafka by ticdc is consumed as “3Am<87>p(”. Could you please explain the reason for this situation?
May I ask what protocol you set when using ticdc for synchronization?
default and canal-json produce the same result.
Hello, canal-json is a serialized protocol and may not display blob type fields well. You can try using open-protocol to handle blob fields and serialize string type fields.
Hello, we have also tried open-protocol, and the result is the same. There is still the issue of dirty data.
It can’t be considered dirty data, it’s just a display issue. Please print out the content of the blob field in hexadecimal, and then compare it with the hexadecimal display of the field in the database to see if they are consistent.
Our testers compared the two different sets of data and found that the hexadecimal data is different.