Optimization Issues in DM Synchronization Configuration

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dm同步配置优化问题

| username: mono

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.5
Synchronizing MySQL data to the TiDB cluster. When adding a source to the DM cluster, one data source is bound to one DM worker. My idea is to bind one data source to two workers, and then have two tasks each use one data source. Compared to having two tasks use only one data source, will this improve performance?

| username: 有猫万事足 | Original post link

First of all, it seems that there is currently no way to bind a data source to two workers.

Secondly, your data source is actually the upstream binlog. Essentially, there is only one copy of it. Even if you bind two workers downstream, these two workers are still reading the same binlog. Therefore, I feel that performance improvement in this mode is unlikely.

In relay log mode, it is unclear whether a master-slave configuration will be supported in the future, but theoretically, it is possible. With relay log support, it might be possible to bind multiple workers to a single data source, but it’s hard to say how much improvement this would bring. It is also unclear whether binlog latency would increase in a master-slave configuration.

After all, in principle, relay logs themselves should introduce some latency. They merely avoid multiple reads of the original binlog file.

| username: mono | Original post link

They are databases with the same IP address but different data source names. Each of the two tasks uses one of these data sources.

| username: Fly-bird | Original post link

It seems that binding two works is not supported.

| username: mono | Original post link

I guess I didn’t make myself clear. For example,

This works. Moreover, source1 and source2 are each bound to a DM worker. The two tasks specify these two sources separately.