Currently, TiKV transaction data cannot be replicated using CDC. Is it possible to use the learner role to achieve full data replication, then isolate the learner role node and restore it as a new cluster to achieve full data replication?
The client performs dual-write logic to complete DR. If any cluster fails, full replication is performed directly.
I don’t understand. Are you trying to use the learner method to copy a set of data downstream to achieve CDC functionality? Or is it for asynchronous replication for disaster recovery? Since the data is already copied over, why does the client need to write twice?
The main purpose is disaster recovery. Currently, there is no CDC solution, so we hope to achieve a similar effect through other means.
The expectation is to have two completely independent clusters. If full replication can be achieved through the learner method, and then the learner nodes can be separated to form a new cluster, this would essentially complete the full synchronization. Then, the client double-write would maintain the consistency of incremental data.
The main goal is to isolate cluster-level failures, such as when a cluster is overwhelmed by sudden traffic spikes (or other avalanche scenarios), rendering it almost non-functional. If there is a new cluster with identical data, it can ensure that core business can switch to the new cluster, achieving fault isolation. In such cases, even multi-center deployment solutions are ineffective, and circuit breaker mechanisms do not yield good results.
ticdc provides transaction replication, but currently, its capability is maintained at “the downstream cluster can recover within 5 minutes and lose at most 10 seconds of data before the issue occurs, i.e., RTO <= 5 mins, P95 RPO <= 10s.” → https://docs.pingcap.com/zh/tidb/stable/manage-ticdc#灾难场景的最终一致性复制
What you want to do should be the DR-autosync feature. You can refer to the following articles. The basic principle is also to use learner and add commit group restrictions. It is not recommended to do it yourself, as single replica recovery is very complex. In some extreme scenarios, data consistency may not be guaranteed in terms of performance.
Adaptive Synchronization Solution for Dual Data Centers in the Same City - Detailed Explanation of DR Auto-Sync → 专栏 - 同城双中心自适应同步方案 —— DR Auto-Sync 详解 | TiDB 社区
DR Auto-Sync Setup and Planned Switch Operation Manual → 专栏 - DR Auto-Sync 搭建和计划内切换操作手册 | TiDB 社区
Currently, the sync solutions are all within a single cluster. What we most hope for is to be able to split into two completely isolated clusters through some means.
Also, I would like to ask if there is a rough schedule for the official txn cdc solution~~
The RPO and RTO of CDC are scheduled, and results are expected to come out this year, but the specific outcomes are unknown. You can look forward to it; CDC is a tool that the tools team will continue to invest in and optimize in the future.