High Server CPU Load During Batch Data Insertion in Projects

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 项目批量写入数据的时候服务器CPU负载高

| username: SoHuDrgon

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.0
[Encountered Problem: Symptoms and Impact]
This is a Java project that pulls CSV files from FTP daily. Each CSV file contains over 1 million entries. The Java project converts these entries into SQL statements and batch inserts them into TiDB. However, during the insertion, TiDB’s CPU usage spikes dramatically, with the load on a 40-core CPU reaching around 80-100.

Optimizations attempted:

  1. Adjusted JDBC parameters:
    useSSL=false&useServerPrepStmts=true&cachePrepStmts=true&prepStmtCacheSize=256&prepStmtCacheSqlLimit=2048&rewriteBatchedStatements=true&allowMultiQueries=true&useConfigs=maxPerformance
    The issue still persists.
  2. Adjusted the number of entries per batch insert. Reduced the number of entries per insert statement from 2000 to 1000, then from 1000 to 500, and finally from 500 to 400. The issue still persists.
  3. Enabled table creation with /*T! SHARD_ROW_ID_BITS=4 PRE_SPLIT_REGIONS=2 */

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

If you need any additional information, please let me know, and I will provide the monitoring data for your review.

| username: zhanggame1 | Original post link

There isn’t much of a good solution. If you can reduce the indexes on the table, it can significantly increase the insertion speed. Also, avoid inserting one row at a time; inserting ten thousand rows in a single insert statement shouldn’t be a problem.

| username: SoHuDrgon | Original post link

I am currently using a single insert with 1000 rows of data. Additionally, the table has the following indexes:
KEY idx_CUSTOMER_NUMBER (CUSTOMER_NUMBER),
KEY idx_IDENTITY_NUMBER (IDENTITY_NUMBER),
KEY idx_PHONE (PHONE),
KEY idx_LOAN_CARD_NO (LOAN_CARD_NO),
KEY idx_MERCHANT_NAME (CUSTOMER_NAME),
KEY idx_LOAN_START_DATE (LOAN_START_DATE),
KEY idx_IS_REPAYMENT (IS_REPAYMENT),
KEY idx_LOAN_END_DATE (LOAN_END_DATE),
KEY idx_PRODUCT_TYPE (PRODUCT_TYPE),
KEY idx_IS_UPLOADED (IS_UPLOADED)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T! SHARD_ROW_ID_BITS=4 PRE_SPLIT_REGIONS=2 */

| username: zhanggame1 | Original post link

You have too many indexes, which will seriously affect insert performance.

| username: 大飞哥online | Original post link

There are so many indexes, and they are constantly being maintained during batch inserts. Delete them and create them after the import is complete.

| username: SoHuDrgon | Original post link

Okay, thank you everyone. I removed all the indexes today, and I’ll observe the results tomorrow!

| username: 大飞哥online | Original post link

Optimize your indexes as well. Do you really need that many indexes? Consider it in conjunction with your business needs.

| username: 昵称想不起来了 | Original post link

Or maybe directly use TiDB’s tool for importing, it might be better optimized than handling it yourself. TiDB Lightning 并行导入 | PingCAP 文档中心

| username: Fly-bird | Original post link

Optimize index

| username: SoHuDrgon | Original post link

Thank you, everyone. Today I observed that without indexes, the system load is very low.



I noticed that the indexes created by the developers are not being used in the explain plan.

No more indexes will be created.

| username: zhanggame1 | Original post link

Remove all the indexes, and if you find that there are indeed slow queries that need indexes, add them back.

| username: 大飞哥online | Original post link

That’s right, frequently used ones should have indexes, but for those that run occasionally, don’t create them.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.