TiCDC User Guide & Reference Materials

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiCDC 使用指南&资料大全

| username: Joyz

What is TiCDC?

TiCDC is a TiDB incremental data synchronization tool. By pulling the data change logs from the upstream TiKV, TiCDC can parse the data into ordered row-level change data and output it to the downstream.

TiCDC Application Scenarios

  • Database Disaster Recovery: TiCDC can be used for disaster recovery scenarios between homogeneous databases, ensuring the eventual consistency of data between the primary and standby clusters in the event of a disaster. Currently, this scenario only supports TiDB as the primary and standby clusters.
  • Data Integration: TiCDC provides the TiCDC Canal-JSON Protocol, supporting other systems to subscribe to data changes. It can provide data sources for scenarios such as monitoring, caching, full-text indexing, data analysis, and master-slave replication of heterogeneous databases.

To quickly understand the basic principles and usage of TiCDC, it is recommended to watch the following training video (33 minutes long):

TiCDC Architecture

TiCDC runs as a stateless node and achieves high availability through the etcd within PD. The TiCDC cluster supports creating multiple synchronization tasks to synchronize data to multiple different downstreams.

The system architecture of TiCDC is shown in the following diagram:

System Roles

  • TiKV CDC Component: Only outputs key-value (KV) change logs.
    • Internally assembles KV change logs.
    • Provides an interface to output KV change logs, sending data including real-time change logs and incrementally scanned change logs.
  • capture: TiCDC running process, multiple capture form a TiCDC cluster responsible for synchronizing KV change logs.
    • Each capture is responsible for pulling a portion of the KV change logs.
    • Sorts one or more pulled KV change logs.
    • Restores transactions to the downstream or outputs according to the TiCDC Open Protocol.

Synchronization Features

1. Sink Support

Currently, the TiCDC sink module supports synchronizing data to the following downstreams:

  • MySQL protocol-compatible databases, providing eventual consistency support.
  • Outputs to Kafka using the TiCDC Open Protocol, achieving row-level order, eventual consistency, or strict transactional consistency.

2. Synchronization Order and Consistency Guarantees

Data Synchronization Order

  • TiCDC can output all DDL/DML at least once.
  • TiCDC may resend the same DDL/DML during TiKV/TiCDC cluster failures. For repeated DDL/DML:
    • MySQL sink can re-execute DDL. For reentrant DDL in the downstream (e.g., truncate table), it executes successfully; for non-reentrant DDL (e.g., create table), it fails, and TiCDC ignores the error and continues synchronization.
    • Kafka sink sends duplicate messages, but duplicate messages do not violate the Resolved Ts constraint. Users can filter them on the Kafka consumer side.

Data Synchronization Consistency

  • MySQL sink
    • TiCDC does not split single-table transactions, ensuring the atomicity of single-table transactions.
    • TiCDC does not guarantee that the execution order of downstream transactions is completely consistent with the upstream.
    • TiCDC splits cross-table transactions by table, not guaranteeing the atomicity of cross-table transactions.
    • TiCDC ensures that single-row updates are consistent with the upstream update order.
  • Kafka sink
    • TiCDC provides different data distribution strategies, distributing data to different Kafka partitions by table, primary key, or ts.
    • Different consumer implementations under different distribution strategies can achieve different levels of consistency, including row-level order, eventual consistency, or cross-table transaction consistency.
    • TiCDC does not provide a Kafka consumer implementation, only offering the TiCDC Open Data Protocol. Users can implement Kafka data consumers based on this protocol.

Synchronization Limitations

When using TiCDC for synchronization, please note the following related limitations and unsupported scenarios.

Requirements for Valid Indexes

TiCDC can only synchronize tables with at least one valid index, defined as follows:

  • Primary key (PRIMARY KEY) is a valid index.
  • Unique index (UNIQUE INDEX) is a valid index if it meets the following conditions:
    • Each column in the index is explicitly defined as non-null (NOT NULL) in the table structure.
    • The index does not contain virtual generated columns (VIRTUAL GENERATED COLUMNS).

From version 4.0.8, TiCDC can synchronize tables without valid indexes by modifying the task configuration, but the guarantee of data consistency is weakened. For specific usage methods and precautions, refer to Synchronizing Tables Without Valid Indexes.

Unsupported Scenarios

Currently, TiCDC does not support the following scenarios:

Note: From version v5.3.0, TiCDC no longer supports circular synchronization.

TiCDC Installation and Deployment

To install TiCDC, you can choose to deploy it with a new cluster or add the TiCDC component to an existing TiDB cluster. For details, refer to TiCDC Installation and Deployment.

TiCDC Cluster Management and Synchronization Task Management

Currently, you can use the cdc cli tool or HTTP interface to manage the TiCDC cluster status and data synchronization tasks. For detailed operations, see:

  1. Using the cdc cli tool to manage cluster status and data synchronization
  2. Using the OpenAPI interface to manage cluster status and data synchronization

TiCDC Open Data Protocol

The TiCDC Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and master-slave replication of heterogeneous databases. TiCDC follows the TiCDC Open Protocol to replicate TiDB data changes to third-party data media such as MQ (Message Queue).
For detailed information, refer to TiCDC Open Data Protocol.

Compatibility Issues

TiCDC Common Issues and Troubleshooting

For frequently encountered issues when using TiCDC, refer to TiCDC Common Issues.
For troubleshooting issues encountered when using TiCDC, refer to TiCDC Troubleshooting.

Collection of TiCDC Usage Practice Articles

Collection of TiCDC Architecture Analysis Articles

Popular Q&A on TiCDC

If you have any questions related to TiCDC, feel free to ask on Asktug. Click to view Question Search Guide & Asking Guidelines!

| username: ShawnYan | Original post link

Blog - TiCDC

https://cn.pingcap.com/blog/tag/ticdc/