There is a discrepancy in the value of the blob field transmitted from TiDB CDC to the Kafka topic

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tidb cdc blob字段值传输到kafka topic中的值存在差异

| username: DataPipeline-应用组

【TiDB Usage Environment】Testing environment
【TiDB Version】5.3.1.0
【Problem Encountered】There is a discrepancy in the blob field value transmitted to the Kafka topic by ticdc.
For example: Manually reading the blob field value and writing it into the Kafka topic results in the value being “3Amp(” when consuming the topic, but the data synchronized to Kafka by ticdc is consumed as “3Am<87>p(”. Could you please explain the reason for this situation?

| username: TammyLi | Original post link

May I ask what protocol you set when using ticdc for synchronization?

| username: DataPipeline-应用组 | Original post link

default and canal-json produce the same result.

| username: Min_Chen | Original post link

Hello, canal-json is a serialized protocol and may not display blob type fields well. You can try using open-protocol to handle blob fields and serialize string type fields.

| username: DataPipeline-应用组 | Original post link

Hello, we have also tried open-protocol, and the result is the same. There is still the issue of dirty data.

| username: Min_Chen | Original post link

It can’t be considered dirty data, it’s just a display issue. Please print out the content of the blob field in hexadecimal, and then compare it with the hexadecimal display of the field in the database to see if they are consistent.

| username: DataPipeline-应用组 | Original post link

Our testers compared the two different sets of data and found that the hexadecimal data is different.