Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: dm load阶段能否使用lightning的physical模式 不同worker实例部署在一台机器,loader阶段是否会重复执行dump文件
[Test Environment] Testing environment
[TiDB Version] 5.1.0 dm version 6.5.0
[Encountered Problem: Phenomenon and Impact] The data migration in the load phase using dm is relatively slow. Currently using the SQL mode, I would like to ask if it is possible to use the physical mode of lightning, and whether this mode is more efficient. The dm documentation does not mention the use of physical mode, but it seems to be supported in the source code.
Additionally, can different dm-worker instances not be deployed on the same machine? If two instances simultaneously operate on a task’s dumpdir, will there be any issues?
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]
Question 1: You can use dumpling+lighting for full synchronization, and then use DM for incremental synchronization based on the corresponding time points. This way, it is more controllable, and you can configure the desired mode normally when configuring lighting.
When configuring DM, it can also be specified. For details, see the article: DM 任务完整配置文件介绍 | PingCAP 文档中心
Question 2: Different workers can be placed on the same machine, but when scaling out, you need to specify different port paths, etc. If resources are sufficient, it is best to place them separately.
Thank you for your reply:
Question 1:
The comment for the import-mode configuration item in the screenshot does not mention that physical mode can be used, only sql and loader are listed.
Question 2:
I have previously deployed different worker instances on one machine, but during the loader phase, will the dumped SQL be executed repeatedly, and will the checkpoint files conflict? For example, in a scenario where I have a task for synchronizing a sharded database, the dump path for one task is unique, and it seems that there is no way to specify a dump path for each worker. If both worker1 and worker2 are executing, worker2 will delete the checkpoint file after completion, and worker1 will not be able to execute.
The first one can use the method I mentioned, full synchronization first and then incremental. This is generally how I synchronize.
For the second question, would this issue occur with different paths as well? I haven’t tested it. I used to place them on different ports of the same machine with different paths and didn’t encounter any problems. For example, when expanding dm-worker, I would configure the dir as dm-8262, dm-8263.
In the next version, we will let DM use the lightning physical mode for import. You can try it out then.
The dump_dir needs to use a relative path to avoid conflicts between the two workers.
Does the dumpling+lightning method support data synchronization in a sharded database scenario? I don’t see any table routing configuration in the settings.
When restoring lighting, you can configure routing like this. Not sure if it meets your needs.
You need to use a relative path for dump_dir to prevent conflicts between the two workers.
This requires deploying two workers in different paths on the machine. If I deploy one worker and start two processes, the dump_dir will still conflict, right?
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.