What are the optimization solutions for slow SQL inserts in TiDB?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb慢sql插入有什么优化方案?

| username: TiDBer_y9IRzLWc

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Currently, from the slow query in the dashboard, we found a lot of slow SQL with replace info. The total execution time exceeds 30 seconds, and the maximum memory usage is between 900 KiB and 2 MiB. It seems that these SQLs are batch insert operations. Could you please advise on any optimization methods for such situations? Should we change the batch SQL to single SQL, or reduce the memory size of each SQL?

| username: tidb菜鸟一只 | Original post link

First, check if there is a write hotspot in the corresponding table.
Then check the JDBC connection.
Are useServerPrepStmts and cachePrepStmts set to true?
Note that prepStmtCacheSqlLimit and prepStmtCacheSize should also be set larger, otherwise just setting the above two values to true won’t be effective.
Are you using batch? If not, use it.
If you are using it, remember to set rewriteBatchedStatements = true.

| username: TiDBer_y9IRzLWc | Original post link

I need to check the code for this. I just got here and don’t know how it’s configured in the code.

| username: hey-hoho | Original post link

Check the Dashboard to see which stage is causing the slowness.

| username: Jellybean | Original post link

First, identify the cause of the slowness, and then develop targeted optimization plans based on the cause.

Currently, it is unclear where the slowness is occurring, and the optimization measures suggested by others may not necessarily be suitable for your business scenario.

| username: redgame | Original post link

You can try changing to single inserts. Sometimes, changing batch insert SQL to single inserts can reduce transaction lock time and improve concurrent performance.

| username: Kongdom | Original post link

I simply have a question: it’s normal for batch inserts to be slow SQL. Is it the slow insertion that’s affecting usage, or is there a requirement to shorten the insertion time?

Also, I feel that “REPLACE INTO” has performance issues. Why not use “INSERT INTO” directly?

| username: TiDBer_y9IRzLWc | Original post link

Mainly for deduplicating data.

| username: zhaokede | Original post link

You can separate the data writing and deduplication into two steps.

| username: TiDBer_y9IRzLWc | Original post link

Do you mean to insert all the data first, then use another service to deduplicate? Or deduplicate after all the data has been inserted?

| username: TiDBer_y9IRzLWc | Original post link

The main slowdown is due to the table locking and the long transaction time during the batch insertion with “replace into”.

| username: Kongdom | Original post link

I understand it’s for deduplication. What I mean is why there is duplicate data in the first place, and whether it is possible to filter out duplicate data before insertion. Pushing all the data to the database just shifts the performance bottleneck and doesn’t solve the actual problem.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.