Write 1.2 million records to TiDB within 10 seconds

translator_bot · June 21, 2024, 3:54am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 在10s内写入120w数据到tidb

| username: TiDBer_Xy7fsN7j

What is the best way to write 1.2 million records into TiDB within 10 seconds? Should I use JDBC insertion or the load data method?

translator_bot · June 21, 2024, 3:54am

| username: Jellybean | Original post link

The import speed is closely related to your hardware configuration, target table structure, number of columns per row, and SQL conditions. You can provide more information.

Usually, using tools like Lightning or BR for physical import will be the fastest, but you also need to consider whether your target cluster is offline and whether there are any ongoing business operations.

translator_bot · June 21, 2024, 3:54am

| username: Jayjlchen | Original post link

It depends on the concurrency. For example, loading data with 40 concurrent threads; if addressing hotspot issues for insertion, then fewer concurrent threads are needed. The premise is that there are no resource bottlenecks, such as NVMe disks, and some rate limiting parameters in TiKV may need to be adjusted.

translator_bot · June 21, 2024, 3:54am

| username: FutureDB | Original post link

You can try using the Lightning import tool, which has two modes:

Local mode: It affects the cluster and requires empty tables, but it is fast.
TiDB mode: It has little impact on the cluster and does not require empty tables, but it is slow. Essentially, it performs batch replace operations.

translator_bot · June 21, 2024, 3:54am

| username: Kongdom | Original post link

120,000 records per second? I’ve managed to reach 20,000-30,000 records per second with Kettle, but 100,000 per second seems a bit high, and it also depends on the size of each record. JDBC probably can’t meet this requirement.

translator_bot · June 21, 2024, 3:54am

| username: zhanggame1 | Original post link

I have stress tested with 500 threads inserting one by one and it exceeded 100,000. If it’s just importing data with one insert carrying more data, a few concurrent threads are enough.

translator_bot · June 21, 2024, 3:54am

| username: TiDBer_jYQINSnf | Original post link

Real-time business insertion?
Try to make the batch size larger for each insertion, that is, include more values.
Try using multiple connections.
If we assume each piece of data is 1k, writing a hundred megabytes per second is not too large. Increase the write buffer size.

translator_bot · June 21, 2024, 3:54am

| username: Kongdom | Original post link

It’s my HHD hard drive that doesn’t match

translator_bot · June 21, 2024, 3:54am

| username: dba远航 | Original post link

This is related to the machine hardware configuration, as well as the concurrency situation and tools.

translator_bot · June 21, 2024, 3:54am

| username: zhaokede | Original post link

There are many ways to ensure network bandwidth and I/O, such as using direct Java code or data import tools like Kettle.

translator_bot · June 21, 2024, 3:54am

| username: WinterLiu | Original post link

I feel that JDBC doesn’t have the capability for this. Using the import method should work, but it also depends on the server hardware conditions.

translator_bot · June 21, 2024, 3:54am

| username: TiDBer_jYQINSnf | Original post link

A single server is not enough; you need multiple servers for writing. I’ve seen JDBC handle hundreds of thousands of QPS.

translator_bot · June 21, 2024, 3:54am

| username: xfworld | Original post link

Just look at this.

translator_bot · June 21, 2024, 3:54am

| username: xingzhenxiang | Original post link

IMPORT INTO

translator_bot · June 21, 2024, 3:54am

| username: 江湖故人 | Original post link

You can try Lightning’s parallel import.
TiDB Lightning Parallel Import | PingCAP Documentation Center

translator_bot · June 21, 2024, 3:54am

| username: wfxxh | Original post link

lightning +1

translator_bot · June 21, 2024, 3:54am

| username: 随缘天空 | Original post link

You can insert using JDBC, with each SQL statement containing 500 rows. One thread runs 20 SQL statements, and then multiple threads run simultaneously.

translator_bot · June 21, 2024, 3:54am

| username: 这里介绍不了我 | Original post link

Sounds very powerful.

translator_bot · June 21, 2024, 3:54am

| username: 哈喽沃德 | Original post link

The way to use it depends on your business, and whether it can achieve the desired result depends on your configuration.

translator_bot · June 21, 2024, 3:54am

| username: Soysauce520 | Original post link

The lightning local mode should be the fastest.