Write 1.2 million records to TiDB within 10 seconds

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 在10s内写入120w数据到tidb

| username: TiDBer_Xy7fsN7j

What is the best way to write 1.2 million records into TiDB within 10 seconds? Should I use JDBC insertion or the load data method?

| username: Jellybean | Original post link

The import speed is closely related to your hardware configuration, target table structure, number of columns per row, and SQL conditions. You can provide more information.

Usually, using tools like Lightning or BR for physical import will be the fastest, but you also need to consider whether your target cluster is offline and whether there are any ongoing business operations.

| username: Jayjlchen | Original post link

It depends on the concurrency. For example, loading data with 40 concurrent threads; if addressing hotspot issues for insertion, then fewer concurrent threads are needed. The premise is that there are no resource bottlenecks, such as NVMe disks, and some rate limiting parameters in TiKV may need to be adjusted.

| username: FutureDB | Original post link

You can try using the Lightning import tool, which has two modes:

  1. Local mode: It affects the cluster and requires empty tables, but it is fast.
  2. TiDB mode: It has little impact on the cluster and does not require empty tables, but it is slow. Essentially, it performs batch replace operations.
| username: Kongdom | Original post link

:flushed: 120,000 records per second? I’ve managed to reach 20,000-30,000 records per second with Kettle, but 100,000 per second seems a bit high, and it also depends on the size of each record. JDBC probably can’t meet this requirement.

| username: zhanggame1 | Original post link

I have stress tested with 500 threads inserting one by one and it exceeded 100,000. If it’s just importing data with one insert carrying more data, a few concurrent threads are enough.

| username: TiDBer_jYQINSnf | Original post link

Real-time business insertion?
Try to make the batch size larger for each insertion, that is, include more values.
Try using multiple connections.
If we assume each piece of data is 1k, writing a hundred megabytes per second is not too large. Increase the write buffer size.

| username: Kongdom | Original post link

:+1: :+1: :+1: It’s my HHD hard drive that doesn’t match :joy:

| username: dba远航 | Original post link

This is related to the machine hardware configuration, as well as the concurrency situation and tools.

| username: zhaokede | Original post link

There are many ways to ensure network bandwidth and I/O, such as using direct Java code or data import tools like Kettle.

| username: WinterLiu | Original post link

I feel that JDBC doesn’t have the capability for this. Using the import method should work, but it also depends on the server hardware conditions.

| username: TiDBer_jYQINSnf | Original post link

A single server is not enough; you need multiple servers for writing. I’ve seen JDBC handle hundreds of thousands of QPS.

| username: xfworld | Original post link

Just look at this.

| username: xingzhenxiang | Original post link

IMPORT INTO

| username: 江湖故人 | Original post link

You can try Lightning’s parallel import.
TiDB Lightning Parallel Import | PingCAP Documentation Center

| username: wfxxh | Original post link

lightning +1

| username: 随缘天空 | Original post link

You can insert using JDBC, with each SQL statement containing 500 rows. One thread runs 20 SQL statements, and then multiple threads run simultaneously.

| username: 这里介绍不了我 | Original post link

Sounds very powerful.

| username: 哈喽沃德 | Original post link

The way to use it depends on your business, and whether it can achieve the desired result depends on your configuration.

| username: Soysauce520 | Original post link

The lightning local mode should be the fastest.