Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 大佬我问一个问题 如果物理导入 cpu给够了 速度能到5g 10g吗

If the CPU is adequately allocated for physical import, can the speed reach 5GB or 10GB?
Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 大佬我问一个问题 如果物理导入 cpu给够了 速度能到5g 10g吗
If the CPU is adequately allocated for physical import, can the speed reach 5GB or 10GB?
The bottleneck should be network and disk I/O. Additionally, Lightning can enable parallel import, allowing multiple tasks to be run in parallel on multiple machines.
When using TiDB Lightning in parallel import mode, the following restrictions are recommended to achieve optimal performance:
The official documentation states 500g/h, but based on my experience, it’s around 300g/h here. It should be related to the data characteristics. However, it’s not slow and can be improved with Lightning and parallel processing.
For a 10-gigabit network, the bottleneck is likely to be in disk I/O.
300GB/h = 85MB/s, assuming 1MB equals 10,000 rows, that’s 850,000 rows per second, which is quite fast
It’s still very difficult. Physical import can handle at most around 500GB per hour, and that’s likely with high-end resources. Normally, it would be around 200-300GB per hour.
Bro, your requirements are a bit high. Try testing with parallel import. You can only scale horizontally, and the hardware needs to be maxed out. Also, be careful not to overwhelm the downstream TiKV.
Increasing the number of clients for parallel import can theoretically improve resource utilization.
The prerequisite for 10g/s is that your bandwidth must be sufficient, right?
If 10,000 lines are 1M, then each line is approximately 100 bytes. One character occupies 1 byte, and an integer occupies 4 bytes. You can store quite a lot.
It’s rare to have such small single rows. Our large tables have rows over 4k.