How to Improve QPS for High-Concurrency and Large-Volume Data Insertion in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb高并发大批量数据写入,如何提高qps

| username: TiDBer_Xy7fsN7j

TiDB database configuration: 8-core CPU, 8GB memory, SSD solid-state drive. Currently, the test data is inserted in batches of 1000, and the QPS can only reach a maximum of 15, which means the processing capacity is only 20,000/s. How can I improve the QPS?

But batch insertion of 1000 records takes up to 2.5 seconds.

| username: 托马斯滑板鞋 | Original post link

Is the CPU gone? Do you have 1000 values in a single SQL statement?

| username: TiDBer_Xy7fsN7j | Original post link

The insert statement is like this, with 1000 values.

| username: 托马斯滑板鞋 | Original post link

The bottleneck is the CPU. If you don’t mind the hassle, you can try version 7.5. According to official data, the insert performance has doubled.

| username: TiDBer_Xy7fsN7j | Original post link

I don’t quite understand. The server has an 8-core CPU, and the CPU utilization is only 80%. There should still be a lot of CPU resources available, right?

| username: 托马斯滑板鞋 | Original post link

:joy: This is just a value collected by the dashboard itself. You can use the dstat 1 command on the host to output the direct value. CPU usage includes not only user but also sys usage. Unless there is extremely high concurrent pressure, it is impossible to be at 100% for a long time. 85% is already considered full load.

| username: TiDBer_Xy7fsN7j | Original post link

What parameters can be set to allow the database to use the system’s CPU to the maximum?

| username: 像风一样的男子 | Original post link

The simplest way to improve QPS is to split the SQL and execute inserts in parallel.

| username: TiDBer_Xy7fsN7j | Original post link

You mean inserting fewer records per batch? But the final throughput = number of records per insert * QPS. If QPS goes up and the number of records per insert goes down, the product is still about the same.

| username: 像风一样的男子 | Original post link

Is there only one TiDB node? It looks like your cluster is a single machine.

| username: 有猫万事足 | Original post link

You can first try following the best practices.

| username: zhanggame1 | Original post link

Does your cluster have a topology, and how many TiKV nodes are there?

| username: dba远航 | Original post link

It looks like the CPU processing capability is limited.

| username: 随缘天空 | Original post link

Try using multithreading, executing insert operations asynchronously with multiple threads in the code.

| username: 孤君888 | Original post link

Try not to insert multiple values at once; instead, try inserting them concurrently. It seems this point is different from MySQL, right?

| username: h5n1 | Original post link

Here are a few issues related to Insert:

  1. When inserting many values, the parse/compile time can be excessively long. To solve this problem, you can try the noprepare plancache feature and reduce the number of value groups. This Plan cache requires a fix to increase the length of the running values. However, it is an experimental feature.
    Optimizer Fix Controls | PingCAP 文档中心

  2. During massive inserts, IO pressure can be an issue. The solution is to use faster hard drives and add more TiKV nodes. If IO is not a problem, you can adjust several TiKV threads to increase downward pressure.

  3. TiDB’s separation of storage and computation architecture inevitably leads to some performance loss. Additionally, it is best to connect to multiple inserts through load balancing when connecting to TiDB.

| username: 小于同学 | Original post link

Add a few more tidb-service nodes.