Experts, I have a question. If enough CPU is allocated for physical import, can the speed reach 5G or 10G?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 大佬我问一个问题 如果物理导入 cpu给够了 速度能到5g 10g吗

| username: tidb狂热爱好者

If the CPU is adequately allocated for physical import, can the speed reach 5GB or 10GB?

| username: 连连看db | Original post link

IO also has limitations.

| username: 像风一样的男子 | Original post link

The bottleneck should be network and disk I/O. Additionally, Lightning can enable parallel import, allowing multiple tasks to be run in parallel on multiple machines.

When using TiDB Lightning in parallel import mode, the following restrictions are recommended to achieve optimal performance:

  • Each TiDB Lightning instance should be deployed on a separate machine. TiDB Lightning will consume all CPU resources by default, so deploying multiple instances on a single machine will not improve performance.
  • The total size of the source files imported by each TiDB Lightning instance should not exceed 5 TiB.
  • The total number of TiDB Lightning instances should not exceed 10.
| username: Demo二棉裤 | Original post link

The official documentation states 500g/h, but based on my experience, it’s around 300g/h here. It should be related to the data characteristics. However, it’s not slow and can be improved with Lightning and parallel processing.

| username: 江湖故人 | Original post link

For a 10-gigabit network, the bottleneck is likely to be in disk I/O.

| username: 江湖故人 | Original post link

300GB/h = 85MB/s, assuming 1MB equals 10,000 rows, that’s 850,000 rows per second, which is quite fast :grinning:

| username: zhanggame1 | Original post link

Physical import also depends on hard drive performance.

| username: 随缘天空 | Original post link

It’s still very difficult. Physical import can handle at most around 500GB per hour, and that’s likely with high-end resources. Normally, it would be around 200-300GB per hour.

| username: okenJiang | Original post link

Bro, your requirements are a bit high. Try testing with parallel import. You can only scale horizontally, and the hardware needs to be maxed out. Also, be careful not to overwhelm the downstream TiKV.

| username: YuchongXU | Original post link

Network card and hard drive can’t reach it either.

| username: forever | Original post link

What kind of data is that? 10,000 rows are only 1MB :sweat_smile:

| username: 像风一样的男子 | Original post link

Increasing the number of clients for parallel import can theoretically improve resource utilization.

| username: 小龙虾爱大龙虾 | Original post link

The prerequisite for 10g/s is that your bandwidth must be sufficient, right?

| username: zhaokede | Original post link

Bandwidth and SSD could both become bottlenecks.

| username: redgame | Original post link

The bottleneck is IO.

| username: 江湖故人 | Original post link

If 10,000 lines are 1M, then each line is approximately 100 bytes. One character occupies 1 byte, and an integer occupies 4 bytes. You can store quite a lot.

| username: forever | Original post link

It’s rare to have such small single rows. Our large tables have rows over 4k.

| username: DBAER | Original post link

This mainly depends on disk I/O.

| username: 源de爸 | Original post link

I/O and network are both potential bottlenecks.

| username: zhang_2023 | Original post link

Consider the IO bottleneck.