Continuous Insertion of Big Data

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于大数据持续插入

| username: zqk_zqk

[TiDB Usage Environment] Production / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
How can I optimize the performance of the insert into select statement? On my side, querying one month’s worth of data, which is over 1 million rows, takes about 1 second when querying by day. However, executing the entire statement (inserting over 1 million rows) takes around 400 seconds. The select query performance has already been optimized very well, and it is preliminarily estimated that the time consumption is on the insert into part. Could the experts please provide some guidance?

| username: Billmay表妹 | Original post link

Here are some optimization methods for insert into select statements:

  1. Adjust TiDB Configuration Parameters

You can improve insertion performance by adjusting TiDB configuration parameters, such as:

  • tikv-client.grpc-connection-count: Adjusting the number of gRPC connections for the TiKV client can improve concurrent insertion performance.
  • tikv-client.grpc-keepalive-time and tikv-client.grpc-keepalive-timeout: Adjusting the keepalive time and timeout for gRPC connections of the TiKV client can reduce the creation and destruction of connections, thereby improving insertion performance.
  1. Use LOAD DATA to Import Data

If the data source is a file, you can use LOAD DATA to import data, which is faster than insert into select. LOAD DATA can utilize TiDB’s batch insertion optimization to write data to TiKV at once, avoiding the overhead of inserting row by row.

  1. Adjust Transaction Size

You can improve insertion performance by adjusting the transaction size. If the transaction is too small, it will lead to frequent commits and writes to RocksDB WAL, reducing insertion performance. If the transaction is too large, it will lead to lock contention and high memory usage, also reducing insertion performance. You can adjust the transaction size according to the actual situation, generally recommended to be between 1MB and 10MB.

  1. Use Batch Insertion

You can merge multiple insert statements into one and use batch insertion to insert data. For example, using insert into ... values (), (), ... to insert multiple rows can reduce the overhead of insert statements and improve insertion performance.

  1. Adjust TiKV Cluster Configuration

If the TiKV cluster configuration is unreasonable, it will also affect insertion performance. You can improve insertion performance by adjusting the TiKV cluster configuration, such as:

  • Adjusting RocksDB configuration parameters, such as write_buffer_size, max_write_buffer_number, min_write_buffer_number_to_merge, etc., can improve write performance.
  • Adjusting Raft configuration parameters, such as raft-store-pool-size, raft-apply-pool-size, etc., can improve Raft’s concurrent processing capability.

I hope the above methods can help you.