Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 快速地将大量数据导入 TiDB 集群为什么要用DM? 而不建议使用inset?
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Why is it recommended to use the DM tool to quickly import large amounts of data into the TiDB cluster, rather than directly using the insert method to insert data? Isn’t it more troublesome to use DM, as it requires creating a MySQL database as a data source for synchronization?
Perhaps you misunderstood. DM is used to synchronize data from the source MySQL database to the target TiDB.
There is no such suggestion.
For fast bulk data import, the recommendation is the physical mode of Lightning.
The best practices for DM also emphasize that the data volume should be less than 1T.
The main function of DM is to merge tables and databases, not for fast import.
It depends on your source. If it’s MySQL, use DM. If it’s a CSV or SQL file, use Lightning. If it’s TiDB, use TiCDC.
A large amount of data needs to be inserted into TiDB, tens of megabytes per second.
There’s no need to use TiCDC for importing data, it feels like it’s better to just use Dumpling and Lightning directly.
It’s not about importing data, it’s about concurrent data insertion by a normal program.
I don’t understand. If you need to use a program to insert data, then you can only use insert. How does DM come into this? If it’s an SQL file or CSV, just use the Lightning tool directly.
I understand, it was my misunderstanding.
Use Lightning to import SQL or CSV files. DM is used for synchronization.
The DM tool can improve the efficiency, reliability, and flexibility of data import, while also ensuring data consistency and integrity.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.