Questions about the deployment location of the TiCDC component

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于TiCDC组件部署位置的问题

| username: zhanggame1

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.1.2
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
We want to create a same-city backup based on TiCDC, with 6 nodes in the primary cluster and 3 nodes in the backup cluster.
Some materials suggest that to avoid pressure on the primary cluster, TiCDC can be deployed on the backup cluster. The question is, if deployed this way, the TiCDC component is added on the primary cluster using TiUP, but it just runs on the backup machines.

| username: Kongdom | Original post link

It should be two clusters, a primary cluster and a backup cluster, using Tiup to install TiCDC in the backup cluster for data synchronization. I think that’s what it means.

  • When using TiCDC to synchronize data between two TiDB clusters, if the delay between upstream and downstream exceeds 100 ms:
    • For versions before v6.5.2, it is recommended to deploy TiCDC in the region (IDC, region) where the downstream TiDB cluster is located.
    • After optimization, for versions v6.5.2 and later, it is recommended to deploy TiCDC in the region (IDC, region) where the upstream cluster is located.
| username: zhanggame1 | Original post link

I don’t think you can use TiCDC for the backup cluster.

| username: Kongdom | Original post link

Why? Is there any difference?

| username: zhanggame1 | Original post link

From the installation perspective, TiCDC is a component of the cluster. I think only this cluster can be used as the source.

| username: Jellybean | Original post link

Yes, you cannot use the backup cluster; you can only use the main cluster. TiCDC is a part of the cluster components, and it captures the data change logs of the cluster. Just think about whose changes it needs to capture, and it is obviously capturing the data changes of the main cluster to synchronize downstream.

The official documentation says it can be deployed in the backup cluster. My understanding is that the nodes can be mixed with the downstream backup cluster, physically using the downstream machine resources. However, in terms of management, it is uniformly managed by the upstream main cluster’s tiup, and the actual deployment architecture is still a part of the upstream.

| username: Fly-bird | Original post link

Install CDC on the main cluster to synchronize to the secondary cluster.

| username: 像风一样的男子 | Original post link

I guess they mean that the components belong to the main cluster, but the services are deployed on the servers of the secondary cluster.

| username: Kongdom | Original post link

:thinking: So what’s the difference between having a separate server as TiCDC? I think the emphasis here is on the primary and secondary clusters, which should be two independent clusters. If it’s the same cluster, once more than half of the nodes go down, the secondary cluster can’t be used either.

But it still depends on the source of the original poster’s information. Could you provide a link?

| username: dba远航 | Original post link

It is possible in the backup cluster, but the version needs to be appropriate.

| username: TiDBer_小阿飞 | Original post link

From an architectural perspective, each Capture process contains one or more Processor threads. It is necessary to comprehensively consider the primary and secondary relationships of upstream and downstream, as well as the primary and secondary hardware resources and production pressure.

| username: 小龙虾爱大龙虾 | Original post link

When using the TiCDC component for synchronization between the primary cluster and disaster recovery scenarios, best practices typically consider network latency, especially when there is high latency between two availability zones (AZ).

Before TiDB version 6.5.2, it was recommended to place the TiCDC component in the disaster recovery data center to reduce the latency from the TiCDC node to the disaster recovery cluster, thereby speeding up the SQL replay process.

However, starting from version 6.5.2, TiCDC has been optimized to send SQL in batches to the downstream, effectively solving the issue of slow SQL replay due to high latency. Therefore, the current best practice is to place the TiCDC component in the production data center to improve synchronization efficiency.

The TiCDC component always belongs to the primary cluster.

| username: Kongdom | Original post link

:thinking: From this description, it is indeed a component of the main cluster, just in a different physical location.

| username: heiwandou | Original post link

This way, the target cluster is deployed on the CDC node of the main cluster.

| username: zhanggame1 | Original post link

I also suspect that this is the case.

| username: Mzb329 | Original post link

Your local backup solution is feasible. To avoid putting too much pressure on the primary database, we can deploy ticdc on the secondary node.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.