Multiple Changefeeds in TiCDC Contain the Same Table

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: ticdc多个changefeed包含相同的表

| username: HACK

To improve efficiency, please provide the following information. Clear problem descriptions can be resolved more quickly:
【TiDB Usage Environment】Production, Testing, Research
【TiDB Version】
【Encountered Problem】
Created two changefeeds named cdc1 and cdc2.
Both changefeeds are configured to synchronize the same table testdb.test.

In this case, is there only one changefeed synchronization task at the same time to replicate the data on this test table?

【Reproduction Path】What operations were performed to encounter the problem
【Problem Phenomenon and Impact】

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: Meditator | Original post link

  1. It hasn’t been verified, but theoretically, there will only be one processor on a capture to handle this task because the table_id of the testdb.test table is globally unique, and the processor owner will hash this task to different captures based on this table_id.
  2. If there is only one task (changefeed) running, then if the sinks are different, wouldn’t that mean only one sink is working?! I don’t have the environment at hand right now, I’ll test it later.
| username: HACK | Original post link

In the case of multiple sinks, I haven’t tested it.
In my environment, there is only one downstream TiDB environment, and there are two changefeeds synchronizing the same table upstream at the same time. I haven’t found any anomalies.

| username: Meditator | Original post link

Use changefeed query to check the task status and see the checkpoint rolling situation.

| username: neilshen | Original post link

No, two changefeeds will simultaneously synchronize the data on this test table.

| username: HACK | Original post link

For example, when performing an insert operation on the test table, what internal mechanism is used for deduplication during downstream replication? Otherwise, it would result in two insert operations.

| username: BinLi1988 | Original post link

But the problem is that you did configure two changefeeds, didn’t you? Isn’t that twice?

| username: HACK | Original post link

Recreated a non-primary key table and observed that it indeed generated 2 identical records. Additionally, there were duplicate DDL execution entries in the CDC logs when the table was created.

If a primary key table is synchronized, it can also be synced downstream, but no related primary key conflict information was found in the logs.

| username: neilshen | Original post link

It won’t deduplicate; changefeeds are independent of each other. They don’t know which tables other changefeeds are capturing or where they are syncing to.

After recreating a non-primary key table, I indeed saw that the target produced 2 identical records. Additionally, when creating the table, there was information in the CDC log about the DDL being executed twice.

This does happen. If two changefeeds have the same downstream, both DML and DDL will be synced redundantly.

| username: xfworld | Original post link

Then it still wasn’t used according to the guidelines, which will cause a bunch of problems :star_struck:

| username: HACK | Original post link

Got it, thank you.

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.