The issue of slow write speed to TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 程序写入tidb速度慢的问题

| username: TiDBer_xV0VbTdz

There are already 5 million records in the table.
Inserting data using slicing and asynchronous methods:

List<List<tableEntity>> split = CollUtil.split(tableList, 5000);
List<CompletableFuture<Integer>> collect = split.stream().map(dataList ->
    CompletableFuture.supplyAsync(() -> {
        tableDao.insertBatch(dataList);
        // Using multi-value batch insert for insertion.
        return 1;
    }, executorService)).collect(Collectors.toList());
// executorService is set to 20

Time consumed to insert 500,000 records with different slice sizes:

size 100      55 s             9090 records/s
size 1000     132 s            3787 records/s
size 5000     69 s             7246 records/s
size 50       62 s             8064 records/s

I feel the insertion efficiency is unsatisfactory. How can I improve the insertion speed?
Attached heatmap:

| username: WalterWj | Original post link

Check this out: 插入数据 | PingCAP 文档中心

| username: zhaokede | Original post link

Take a look at the plan and see which step is slow.

| username: 小龙虾爱大龙虾 | Original post link

What’s so interesting about INSERT :joy_cat:

| username: zhanggame1 | Original post link

What if we add a few more processes to write concurrently?

| username: Jellybean | Original post link

When inserting data, check the heatmap on the Dashboard to see if there are any hotspots.

Also, pay attention to the TiKV monitoring to see if there are any performance bottlenecks.

| username: TiDBer_H5NdJb5Q | Original post link

I agree with the above. Check if there are any hotspot regions, and monitor at the system level to see if memory or I/O is the bottleneck.

| username: Jellybean | Original post link

You can refer to the troubleshooting process in the article:

| username: TiDBer_xV0VbTdz | Original post link

I’m not very good at reading heat maps, experts.

| username: 小于同学 | Original post link

Take a look at the execution plan.

| username: zhaokede | Original post link

The heatmap is really hot.
Is the disk corresponding to tikv an SSD or HDD?

| username: 友利奈绪 | Original post link

High concurrency can lead to slightly poorer performance on mechanical hard drives.

| username: TiDBer_BLeyZ3MR | Original post link

Mechanical disk

| username: TiDBer_ZxWlj6A1 | Original post link

It seems not suitable to use HDD.

| username: TiDBer_H5NdJb5Q | Original post link

It looks like all the machines are quite hot.

| username: 霸王龙的日常 | Original post link

TiKV is recommended to use SSDs to avoid disk I/O becoming a bottleneck.

| username: tidb菜鸟一只 | Original post link

  1. Can you share the table structure to see if it can be optimized?
  2. Even if you optimize the table structure, the performance of a mechanical disk will still be limited and won’t improve significantly.
| username: FutureDB | Original post link

Sometimes it is also necessary to check, you can look at the lock information when the concurrency is high.

| username: 随缘天空 | Original post link

The heatmap is too bright, it should be a hotspot issue. Is the primary key of the data incrementing? Also, check if the cluster’s QPS is balanced; it might be concentrated on a single machine.

| username: TiDBer_H5NdJb5Q | Original post link

Can the JDBC connection be configured with batch?