Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: DM worker为什么低要求大于mysql实例
Our company has hundreds of small MySQL instances. I want to use a big data platform to exchange data to TiDB and then configure incremental data with DM. But why does the documentation require the number of workers to be greater than the MySQL instances? I am preparing a few virtual machines, and it also requires SSDs? Isn’t synchronization supposed to be on-the-fly?
I have hundreds of small instances, deploying hundreds of md worker processes would be too wasteful of resources.
Is it one instance per database? If it’s one instance with multiple databases, one DM worker is enough.
Most instances have one database.
The architecture of DM is like this: each DM Worker corresponds to an upstream MySQL instance and establishes a replication channel with MySQL in Slave mode. Extra DM Workers will remain idle. If there are many upstream MySQL instances, you can consider synchronizing these MySQL instances in batches or deploying them in a mixed manner on limited server resources, but the performance might not be very good. SSDs are used to handle scenarios where relay logs need to be enabled, as they have high I/O demands. Additionally, DM Workers also have a lot of I/O during normal operation.
Yes, a DM worker can only bind to one data source. If there are enough resources, you can set up multiple workers on the same machine.
The source and worker are temporarily one-to-one corresponding. If the load of a single upstream instance is not large, you can try deploying multiple workers on the same machine.
Currently, source and worker are one-to-one corresponding. There will be related optimizations to decouple source and worker in the future:
FYI: Tracking issue for dm to supports one worker bound to several sources · Issue #4687 · pingcap/tiflow · GitHub
Although it is currently necessary to have a one-to-one correspondence between worker and source, in fact, the consumption of computing resources by the DM worker is not very significant; if the relay log function is not enabled, it is not very dependent on disk read and write (after all, you are performing incremental migration).
The number of DM workers can only increase, not decrease. Each MySQL instance requires a separate DM worker. If you don’t enable relay log, SSD is not necessary. However, synchronizing hundreds of instances at once is indeed quite troublesome.
This topic will be automatically closed 60 days after the last reply. No new replies are allowed.