How to Optimize the Efficiency of Synchronizing Data from HDFS to TiDB Using Sqoop?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: sqoop 将hdfs上的数据同步至TiDB 的效率该如何调优?

| username: TiDBer_RbcaAMmi

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】6.5
【Reproduction Path】
【Encountered Problem: Problem Phenomenon and Impact】Using Sqoop to synchronize 300 million rows of data from Hive to TiDB took nearly 8 hours…
【Resource Configuration】
【Attachments: Screenshots / Logs / Monitoring】

| username: 有猫万事足 | Original post link

I can only help up to this point. For the remaining issues, I suggest asking in the Sqoop community.

| username: 友利奈绪 | Original post link

Why not try DataX?

| username: Jellybean | Original post link

When importing data with Sqoop, it’s important to control the batch size of the tool’s imports. If the batch size is too large, it can lead to performance issues. Additionally, in scenarios involving large-scale data ingestion, be mindful of cluster write hotspots and TiKV write performance bottlenecks. Take preventive measures and monitor the import process in advance. If issues arise, you can search the forum for targeted solutions, as there are usually solutions available.

| username: 随便改个用户名 | Original post link

Use DATAX, increasing the channel should be quite fast.