Should CDC be deployed on the source or target end? Or both? Is it better for the version to match the target end or the source end?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: CDC部署在源端还是目标端?还是都可以?版本是要和目标端一样还是和源端一致比较好

| username: 舞动梦灵

Should CDC be deployed on the source end or the target end? Or can it be deployed on both? Should the version be the same as the target end or the source end?
Currently, there is a source end with version 4.0.2, and we are preparing to migrate to a new version 5.0.0. Should CDC synchronization be deployed on the target end with version 5.0 or on the source end with version 4.0.2?
If I deploy version 5.0.0 on the target end and let it fetch the kv log information from the source end with version 4.0.2 to synchronize to version 5.0, will there be any issues?

| username: xfworld | Original post link

First, consider upgrading the version, then consider using CDC…

CDC captures change data information from TiKV, so it is better to be closer to the TiDB cluster…

| username: h5n1 | Original post link

To parse the kv change log from the source end, it is best to have the same version as the source end. At the target end, it just executes SQL.

| username: Billmay表妹 | Original post link

It’s best to keep both versions consistent to avoid some issues~

| username: 舞动梦灵 | Original post link

I read the official documentation. Version 5.0.0 is compatible with the 4.0 version cluster, except for the 5.0.0-RC version. I also want to keep the versions consistent. Now I need to migrate, and the old version is 4.0.2, which is too old. I don’t dare to upgrade too high. I’ll upgrade to 5.0 during the migration and then gradually upgrade to 6.

| username: 舞动梦灵 | Original post link

At the source end, only version 4.0.2 can be expanded and deployed, while at the target end, only version 5.0 can be deployed.

| username: tidb菜鸟一只 | Original post link

In your case, use version 4.0 first. If it’s the same version, when there is network latency, it is recommended to deploy ticdc downstream.

| username: 舞动梦灵 | Original post link

If you use 4.0, it means you can only deploy it on the source side, right? The source side is version 4.0. The target side is 5.0. It’s not possible to install version 4.0 on the downstream side, right?

| username: tidb菜鸟一只 | Original post link

Yes, if the versions on both sides are inconsistent, then deploy it on the source end.

| username: 大飞哥online | Original post link

Source-side parsing, target-side execution. It would still be better to focus on the parsing side.

| username: Fly-bird | Original post link

CDC needs to be deployed at the source end of the cluster, and the target end only requires network connectivity.

| username: zhanggame1 | Original post link

CDC can be deployed on either the source or the target end. The official documentation states that if the primary database has a high load and the network has high latency, it should be placed on the client side.

| username: ajin0514 | Original post link

I suggest using version 4.0.

| username: h5n1 | Original post link

It seems that currently, CDC can only be deployed together with the source cluster because it needs to connect to PD and write task information to PD. I haven’t tried deploying CDC on the target side to extract data from the source side. However, CDC deployed on the source side can be placed within the network of the target side.

| username: 舞动梦灵 | Original post link

If there are 3 PDs at the source end, which address should I write when creating a task with ctl cdc?

| username: 啦啦啦啦啦 | Original post link

You don’t need to write the PD address, just specify the TiCDC address and the downstream address. TiCDC itself is within the cluster and will automatically find PD.
Refer here:

| username: 舞动梦灵 | Original post link

The command cdc cli changefeed create --pd=http://10.0.10.25:2379 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" --sort-engine="unified" is a task creation template for version 5.0. The --pd flag specifies the address of the PD (Placement Driver) of the source, and the uri is the target address. This command format is new and applicable for versions 6.5 and later, including version 7.1.

| username: 啦啦啦啦啦 | Original post link

Sorry, the 6.5 version I’m using doesn’t require it, but 4.0 does. You just need to specify the leader address.

| username: TiDBer_小阿飞 | Original post link

Did it work? Are you using version 5.0 on the target end?

| username: 舞动梦灵 | Original post link

Unable to create tasks on the target end. Only on the source end. Now even creating on the source end doesn’t sync~~~ Sigh