Currently, TiKV txn data cannot be replicated via CDC. Can it be achieved through a learner node?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV txn数据目前无法通过cdc的方式进行复制,是否可以通过learner节点实现

| username: 夜-NULL

Currently, TiKV transaction data cannot be replicated using CDC. Is it possible to use the learner role to achieve full data replication, then isolate the learner role node and restore it as a new cluster to achieve full data replication?

The client performs dual-write logic to complete DR. If any cluster fails, full replication is performed directly.

I saw the deployment architecture of two data centers in one city in the official documentation and wanted to see if this idea is feasible?

| username: ddhe9527 | Original post link

I don’t understand. Are you trying to use the learner method to copy a set of data downstream to achieve CDC functionality? Or is it for asynchronous replication for disaster recovery? Since the data is already copied over, why does the client need to write twice? :sweat_smile:

| username: 夜-NULL | Original post link

The main purpose is disaster recovery. Currently, there is no CDC solution, so we hope to achieve a similar effect through other means.

The expectation is to have two completely independent clusters. If full replication can be achieved through the learner method, and then the learner nodes can be separated to form a new cluster, this would essentially complete the full synchronization. Then, the client double-write would maintain the consistency of incremental data.

The main goal is to isolate cluster-level failures, such as when a cluster is overwhelmed by sudden traffic spikes (or other avalanche scenarios), rendering it almost non-functional. If there is a new cluster with identical data, it can ensure that core business can switch to the new cluster, achieving fault isolation. In such cases, even multi-center deployment solutions are ineffective, and circuit breaker mechanisms do not yield good results.

| username: cs58_dba | Original post link

It feels so complicated, it would be better to just lay a fiber optic cable within the same city for direct strong synchronization.

| username: 夜-NULL | Original post link

Uh, how do you operate this? I don’t understand the meaning of “strong synchronization”~

| username: jansu-dev | Original post link

  1. ticdc provides transaction replication, but currently, its capability is maintained at “the downstream cluster can recover within 5 minutes and lose at most 10 seconds of data before the issue occurs, i.e., RTO <= 5 mins, P95 RPO <= 10s.”https://docs.pingcap.com/zh/tidb/stable/manage-ticdc#灾难场景的最终一致性复制

  2. What you want to do should be the DR-autosync feature. You can refer to the following articles. The basic principle is also to use learner and add commit group restrictions. It is not recommended to do it yourself, as single replica recovery is very complex. In some extreme scenarios, data consistency may not be guaranteed in terms of performance.
    Adaptive Synchronization Solution for Dual Data Centers in the Same City - Detailed Explanation of DR Auto-Sync → 专栏 - 同城双中心自适应同步方案 —— DR Auto-Sync 详解 | TiDB 社区
    DR Auto-Sync Setup and Planned Switch Operation Manual → 专栏 - DR Auto-Sync 搭建和计划内切换操作手册 | TiDB 社区

| username: 夜-NULL | Original post link

Okay, thank you, I will study it~

Currently, the sync solutions are all within a single cluster. What we most hope for is to be able to split into two completely isolated clusters through some means.

Also, I would like to ask if there is a rough schedule for the official txn cdc solution~~

| username: jansu-dev | Original post link

The RPO and RTO of CDC are scheduled, and results are expected to come out this year, but the specific outcomes are unknown. You can look forward to it; CDC is a tool that the tools team will continue to invest in and optimize in the future.

| username: 夜-NULL | Original post link

Okay~~thx~

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.