How to Achieve High Availability in the TICDC Module

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TICDC模块如何实现高可用

| username: residentevil

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.0
[Encountered Problem: Problem Phenomenon and Impact] TICDC can achieve incremental data synchronization between TiDB clusters. How can this component achieve high availability? For example, can two TICDC components be deployed in a primary-backup mode, where the backup automatically takes over the synchronization task if the primary fails?

| username: 像风一样的男子 | Original post link

Multiple TiCDC nodes will automatically achieve high availability. Each TiCDC cluster contains multiple TiCDC nodes, which regularly report their status to the etcd cluster in the PD cluster and elect one of the nodes as the Owner of the TiCDC cluster. The Owner uses etcd to uniformly store the state for scheduling and writes the scheduling results directly into etcd. The Processor completes the corresponding tasks according to the state. If the node where the Processor is located encounters an exception, the cluster will schedule the table to other nodes. If the Owner node encounters an exception, the Capture process of other nodes will elect a new Owner.

| username: residentevil | Original post link

Then you need to deploy the TiCDC cluster mode. Another question, if TiCDC consumes the upstream redo log, is it idempotent? For example, if a single TiCDC component fails after deployment and you want to deploy a new TiCDC component to continue synchronization, how do you find this point? Can you move forward from the moment of the exception? Or will TiCDC automatically find the point?

| username: 像风一样的男子 | Original post link

CDC reads the changelogs from TiKV and synchronizes them downstream. During this process, there will be a TSO to record the synchronization time point. If your task fails, you can specify this TSO time point in the new task to start reading the changelogs.

| username: residentevil | Original post link

Question 1: Is this changelog the WAL log of RocksDB?
Question 2: Where can I check the TSO timestamp, and where can I check the synchronization delay? Is there an API interface?

| username: 像风一样的男子 | Original post link

The kv change log is not the raft log. The kv change log is a row changed event provided by TiKV that hides most of the internal implementation details, and it is conducted on a region basis.
You can find this TSO in the CDC logs or the CDC CLI changefeed list task list. If there is a delay, you can see the details in the tsp-prod-slave-cluster-TiCDC monitoring on Grafana.

| username: 像风一样的男子 | Original post link

If you want to learn more about CDC, you can check out their design documents at tiflow/docs/design at master · pingcap/tiflow · GitHub

| username: residentevil | Original post link

Got it, thank you very much.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.