Why DM Worker Has Lower Requirements Than MySQL Instance

translator_bot · June 23, 2024, 8:22am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM worker为什么低要求大于mysql实例

| username: TiDBer_hwEZA4rV

Our company has hundreds of small MySQL instances. I want to use a big data platform to exchange data to TiDB and then configure incremental data with DM. But why does the documentation require the number of workers to be greater than the MySQL instances? I am preparing a few virtual machines, and it also requires SSDs? Isn’t synchronization supposed to be on-the-fly?

translator_bot · June 23, 2024, 8:22am

| username: hey-hoho | Original post link

The DM worker and the upstream MySQL instance have a one-to-one relationship; one worker cannot serve multiple MySQL instances.
It is recommended to use SSDs for deploying DM workers in a production environment, as worker nodes will have high-frequency disk read and write operations.

translator_bot · June 23, 2024, 8:22am

| username: TiDBer_hwEZA4rV | Original post link

I have hundreds of small instances, deploying hundreds of md worker processes would be too wasteful of resources.

translator_bot · June 23, 2024, 8:22am

| username: forever | Original post link

Is it one instance per database? If it’s one instance with multiple databases, one DM worker is enough.

translator_bot · June 23, 2024, 8:22am

| username: TiDBer_hwEZA4rV | Original post link

Most instances have one database.

translator_bot · June 23, 2024, 8:22am

| username: ddhe9527 | Original post link

The architecture of DM is like this: each DM Worker corresponds to an upstream MySQL instance and establishes a replication channel with MySQL in Slave mode. Extra DM Workers will remain idle. If there are many upstream MySQL instances, you can consider synchronizing these MySQL instances in batches or deploying them in a mixed manner on limited server resources, but the performance might not be very good. SSDs are used to handle scenarios where relay logs need to be enabled, as they have high I/O demands. Additionally, DM Workers also have a lot of I/O during normal operation.

translator_bot · June 23, 2024, 8:22am

| username: Hacker007 | Original post link

Yes, a DM worker can only bind to one data source. If there are enough resources, you can set up multiple workers on the same machine.

translator_bot · June 23, 2024, 8:22am

| username: Cabbager | Original post link

The source and worker are temporarily one-to-one corresponding. If the load of a single upstream instance is not large, you can try deploying multiple workers on the same machine.

translator_bot · June 23, 2024, 8:22am

| username: buchuitoudegou | Original post link

Currently, source and worker are one-to-one corresponding. There will be related optimizations to decouple source and worker in the future:

FYI: Tracking issue for dm to supports one worker bound to several sources · Issue #4687 · pingcap/tiflow · GitHub

Although it is currently necessary to have a one-to-one correspondence between worker and source, in fact, the consumption of computing resources by the DM worker is not very significant; if the relay log function is not enabled, it is not very dependent on disk read and write (after all, you are performing incremental migration).

translator_bot · June 23, 2024, 8:22am

| username: 履霜知冰 | Original post link

The number of DM workers can only increase, not decrease. Each MySQL instance requires a separate DM worker. If you don’t enable relay log, SSD is not necessary. However, synchronizing hundreds of instances at once is indeed quite troublesome.

translator_bot · June 23, 2024, 8:22am

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.