TiCDC Synchronization to Kafka

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc同步到kafka

| username: Hacker_Petiu56G

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.2.3
[Reproduction Path]
ticdc synchronizes to Kafka, the table has a longtext field storing values with <> information, and when sent to Kafka, it displays as \u003e\u003d information, which does not match the information in the database.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]


image

| username: xfworld | Original post link

CDC supports binary in later versions.

If you are using Canal-json, TiDB versions after V5.4.1 support binary.

| username: Hacker_Petiu56G | Original post link

Is it because the name field is set to varchar that it gets escaped to \u003e? Will only binary not be escaped?

| username: xfworld | Original post link

The encoding definitions for types like varchar, longtext, text, and binary are different, so the handling is also different.

If varchar can meet the requirements, switching to varchar will avoid this issue.

| username: Hacker_Petiu56G | Original post link

Tested in the 5.2.3 environment, varchar/longtext/binary are not supported. Is binary supported after 5.4.1? Are there any other ways to synchronize data to Kafka without changing the data content?

| username: xfworld | Original post link

You can test it on versions after 5.4.1. Canal-json had some enhancements implemented in 5.4.0, so I’m not quite sure.

The binary should definitely work.

| username: Hacker_Petiu56G | Original post link

CDC cannot be upgraded independently, right? It can only be upgraded together with TiDB? It’s not very feasible to upgrade everything globally in a short time. Is there any other way to synchronize data content to Kafka without making changes?

| username: xfworld | Original post link

Yes, only unified upgrades are possible. If you need to upgrade, it is recommended to upgrade to 6.1 LTS.


Use the native protocol or Avro

You will need to write your own client consumer to handle it.

| username: Hacker_Petiu56G | Original post link

Upgraded to 6.1.2 but still encountering escape issues, whether it’s longtext, binary, char/varchar, or avro, canal-json, maxwell.


image

| username: xfworld | Original post link

This is normal. The reception needs to be processed.

Refer to this post:

The last paragraph:
canal-json decoder reference implementation:
tiflow/canal_json_decoder.go at master · pingcap/tiflow (github.com)

It’s basically hard mode~ :cowboy_hat_face:

| username: Hacker_Petiu56G | Original post link

Isn’t CDC directly to Kafka? Where can it be processed when receiving?
Do you have any examples?

| username: xfworld | Original post link

CDC goes directly to Kafka, but when receiving data from Kafka, you need to handle it yourself.

TiCDC standard protocol has examples: