Discussion on TICDC Data Synchronization Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TICDC同步数据问题讨论

| username: jaybing926

[TiDB Usage Environment] Production Environment
[TiDB Version] v4.0.9
[Encountered Problem: Phenomenon and Impact]
Scenario: The upstream TiDB has databases db1 and db2. A new downstream TiDB database is created, and db1 data is backed up using BR, followed by backing up db2 data using BR.
BR restore is used to recover db1/db2 data to the downstream TiDB database.
Problem: When the upstream TiDB uses TiCDC to synchronize data to the downstream TiDB, it specifies starting synchronization from the BackupTS of db1 (without specifying the database). Is this operation feasible? Will it succeed or result in an error?
There is a time gap between the backups of db1 and db2 (new data will be generated during this period). If synchronization starts from the BackupTS of db1, will some data from db2 be written redundantly, or will it directly result in an error?

| username: GreenGuan | Original post link

Regarding your question, I personally feel:

  1. Is this operation possible?
    Yes, TiCDC is just a tool that captures changes based on the current TSO (as long as the TSO has not been garbage collected).

  2. Will it succeed or report an error?
    I personally feel it will succeed, but shortly after, when TiCDC synchronizes the incremental data of db2, there will be conflicts with the already backed-up data of db2, causing the TiCDC task to stop.

For the above scenario, there are two solutions you can refer to:

  1. Use BR to back up db1 and db2 to obtain a unified TSO (provided they are under the same instance).
  2. If they are not under the same instance, you need to start two TiCDCs to synchronize db1 and db2 separately.
| username: jaybing926 | Original post link

Okay, thanks for the reply.

Can you confirm that the conflict error in 2 causes the task to stop?

It’s on the same instance, but there are many databases and a large amount of data. Re-backing up and restoring is quite time-consuming, so I wanted to see if this method would work. If the conflict could just give a warning or overwrite the data and continue, it would save a lot of trouble.

| username: GreenGuan | Original post link

For tables with primary keys, TiCDC uses “REPLACE INTO” for inserts. For tables without primary keys, conflicts may occur.

| username: jaybing926 | Original post link

Okay, understood.

If I create a separate CDC task for each of these databases to synchronize, with the same data source and destination, will these tasks affect each other?

Our version is relatively old, v4.0.9. Are there any specific configurations in this configuration file that we need to pay special attention to?

| username: GreenGuan | Original post link

Theory will not be affected.

| username: jaybing926 | Original post link

Okay, thank you~

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.