TiCDC transfers data to Kafka with a single record containing information from two different tables

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc传输数据到kafka中一条record信息中包含了两张不同表的信息

| username: DataPipeline-应用组

【TiDB Environment】Test environment
【TiDB Version】5.3.1
【Encountered Problem】
When ticdc transfers data to Kafka, a single record contains information from two different tables. Is this normal, and is there any setting to avoid this situation?


【Reproduction Path】TICDC creates a task from TiDB to Kafka
【Problem Phenomenon and Impact】

| username: songxuecheng | Original post link

  1. Please share the configuration file for synchronization.
  2. Is the requirement that one table corresponds to one topic?
| username: DataPipeline-应用组 | Original post link

The destination is Kafka. During our local testing, in most cases, the key and value of a record contain information from the same table. We parse the key of the Kafka record to determine the table name. Currently, the Kafka record’s key contains the names of two tables, causing data from different tables to be processed as data from the same table. Additionally, there’s another issue with the value.

D{{“u”:{“AccountAttr”:{ and DY{“u”:{“AccountAttr”:{. I would like to ask under what circumstances it is D{{ and under what circumstances it is DY{.

| username: DataPipeline-应用组 | Original post link

We hope that the key and value of a record in the Kafka topic only contain information from the same table.

| username: songxuecheng | Original post link

You only need to set the required tables for this.

| username: DataPipeline-应用组 | Original post link

Hello, I have configured it this way, but the key of a record’s data contains information from both tables a_a_n613b and a_a_n_613a.
Could you please explain the record value data?
D{{“u”:{“AccountAttr”:{ and DY{“u”:{“AccountAttr”:{, I would like to know under what circumstances it is D{{ and under what circumstances it is DY{.
Is there any related documentation for this?

| username: songxuecheng | Original post link

First question, if you only want the data of a single table to appear in the topic, you can configure the filter to a single table.
Second question, please explain in more detail.

| username: DataPipeline-应用组 | Original post link

The first question is whether to start a ticdc task for a single table or configure it in the configuration file like this:
[filter]
rules = [‘dp_test.*a_tt’, ‘dp_test.a_a_n613b’, ‘dp_test.a_a_n_613a’]
Is this configuration sufficient?

The second question is about consuming data from the topic. The value retrieved is shown in the image. I would like to ask under what circumstances does it start with “D{{” and under what circumstances does it start with “DY{”?

| username: songxuecheng | Original post link

I am not sure about this either, it looks like all the data is being inserted.

| username: songxuecheng | Original post link

You can take a look at this: tiflow/examples/java/src/test/java/com/pingcap/ticdc/cdc/TicdcEventDecoderTest.java at master · pingcap/tiflow · GitHub

| username: neilshen | Original post link

“DY” represents the length of the value in ASCII.

For more details, see: TiCDC Open Protocol | PingCAP Docs

| username: DataPipeline-应用组 | Original post link

Hello, for example, we have configured a ticdc task to synchronize data from 3 tables: TABLE A, B, and C.
[filter]
rules = [‘dp_test.A’, ‘dp_test.B’, ‘dp_test.C’]
The problem we are encountering now is that in a certain record in the topic, the key contains information from both tables A and B, and the value also contains data from both tables A and B, appearing in the same record.
We believe that the key and value of a record should only contain information changes from one table, not two tables.
Could you please advise on how to avoid this situation? Is there something wrong with our settings? Thank you.

| username: songxuecheng | Original post link

If the configuration is like this, it is normal.

| username: DataPipeline-应用组 | Original post link

Excuse me, what kind of configuration can ensure that each record in the topic contains key values with information from only one table?

If it is not possible to ensure that a record’s key contains information from only one table, then when parsing the record, since the key value contains information from two tables, how can the information from the two tables be matched one by one? Seeking advice. Thank you :pray:

| username: songxuecheng | Original post link

[filter]
rules = [‘dp_test.A’]
Only synchronize one table, split the three tables or use version 6.1 https://docs.pingcap.com/zh/tidb/stable/manage-ticdc#topic-分发器

| username: DataPipeline-应用组 | Original post link

I want to confirm whether ticdc version 5.3.0 supports the Canal-JSON Protocol.

| username: DataPipeline-应用组 | Original post link

Is there any documentation available for Canal version 5.3.0?

| username: songxuecheng | Original post link

Version 5.3 is not supported.

| username: songxuecheng | Original post link

Canal is supported.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.