210 Fields, 610 Million Data Volume, Writing from Hive to TiDB via Datax with High Concurrency, Takes 13-20 Hours, 8000 - 17000 Writes per Second

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 210个字段、6亿1数据量、通过Datax多并发从Hive写入TIDB,耗时13-20个小时,每秒8000 - 1w7条写入

| username: 卡卡其其

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.0
[Reproduction Path] Datax synchronizes writing to TiDB, with a writing speed of 9000 rows/s, peaking at 17,000 rows/s, channel = 10, batchSize = 500, HDFS file size around 256MB
[Encountered Problem: Phenomenon and Impact] Writing speed is too slow
[Resource Configuration] 2g-4g
[Attachment: Screenshot/Log/Monitoring]

| username: 数据小黑 | Original post link

Since you already have Hive, why not try TiSpark for writing? It might be better in terms of speed and concurrency, and it could potentially fully utilize the hardware.

| username: tidb狂热爱好者 | Original post link

Hmm, Hive DataX CloudCanal