Does TiCDC support full + incremental replication to Kafka?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Ti CDC是否支持全量+增量复制到kafka

| username: TiDBer_rYOSh9JN

Is it feasible to restart incremental synchronization from 00:00 on the same day after discovering the issue in the morning, given that the cluster crashed at midnight and the CDC is used to synchronize order data to Kafka?

| username: tidb菜鸟一只 | Original post link

It depends on whether your TiDB’s GC data at midnight still exists. If it doesn’t, you can only perform a full + incremental replication to Kafka again.

| username: TiDBer_rYOSh9JN | Original post link

How to perform full replication to Kafka?

| username: MrSylar | Original post link

It is not supported. Setting it to 0 means starting replication from the current point in time. TiCDC is positioned as an incremental tool.

| username: tidb菜鸟一只 | Original post link

CDC indeed does not support full data, only changes. Full data needs to be exported to the downstream database first through Dumpling or DM.

| username: xfworld | Original post link

You can obtain the latest snapshot through select and push the data in a Kafka-compatible format.

Before pushing the full data, you can start TiCDC’s incremental subscription.

However, this might lead to data duplication issues… which can be quite tricky.

| username: TiDBer_rYOSh9JN | Original post link

Is there a plan for TiDB CDC to support full data in the future?

| username: TiDBer_rYOSh9JN | Original post link

The issue can be resolved by setting tidb_gc_life_time to 24 hours with the following configuration:

  • --start-ts: Specifies the start TSO for the changefeed. The TiCDC cluster will start pulling data from this TSO. The default is the current time.
| username: linnana | Original post link

TiCDC is an incremental synchronization tool that supports pulling data from a specific point in time.

| username: cassblanca | Original post link

--start-ts: Specifies the start TSO for the changefeed. The TiCDC cluster will begin pulling data from this TSO, defaulting to the current time. As mentioned above, it depends on the GC life time of the TiDB cluster.

| username: TiDB_C罗 | Original post link

If the data volume is not very large, you can generate a full amount in TiDB by exporting and then importing.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.