SQL Optimization in TiDB: Normal Execution in MySQL, Slow Execution in TiDB Cluster

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDb中SQL优化,在MySQL中执行正常,在TiDb集群下执行很慢

| username: TiDBer_ly0m6BIW

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.5.1
[Reproduction Path]
Executing the following SQL in TiDB takes about 8 seconds, but in MySQL it takes about 600 milliseconds. The overall configuration of the TiDB cluster is similar to MySQL.

SELECT * FROM tr_entrust
where Id in (select TestId from tr_testitems where ReportNo like '%jsc%' or TaskNo like '%jsc%') or EntrustNo like '%jsc%'

Execution plan in TiDB

Execution plan in MySQL

Is there any optimization method for this issue?

| username: xfworld | Original post link

Try modifying the indexes.

The queries for the two tables at the bottom are both full table scans, which will get slower as the data increases.

None of the indexes are being hit…

| username: TiDBer_5cwU0ltE | Original post link

In MySQL, an execution plan is used, but TiDB still performs a full table scan. In this case, check the index situation. Is it possible that there is no corresponding index on TiDB?

| username: TiDBer_ly0m6BIW | Original post link

It should not be an index issue. When the above SQL is modified to the following form, the execution speed is normal:

explain analyze SELECT * FROM tr_entrust
where Id in (select TestId from tr_testitems where ReportNo like '%jsc%' or TaskNo like '%jsc%')
union
SELECT * FROM tr_entrust
where EntrustNo like '%jsc%'

It seems to be caused by the structure of Id (select …) or like this:

Id (select TestId from tr_testitems where ReportNo like '%jsc%' or TaskNo like '%jsc%') or  EntrustNo like '%jsc%'

Currently, there are too many similar SQLs in the system, and modifying them one by one is too much work. Is there any configuration in TiDB that can solve this problem?

Also, I haven’t figured out why queries with the structure Id (select …) or are so much slower.

| username: forever | Original post link

The above hash join took 5 seconds, while the index hash join below took 47 milliseconds. The ID should be the primary key. It might be because you added an OR condition, which caused a full table hash. Try modifying the SQL to use a join instead.

| username: 随缘天空 | Original post link

What is the total data volume?

| username: TiDBer_ly0m6BIW | Original post link

Tens of thousands.

| username: 随缘天空 | Original post link

Such a small amount of data does not reflect the advantages of the TiDB cluster. However, the execution time is indeed too slow. Are the table structures of the two consistent, especially the indexes?

| username: TiDBer_ly0m6BIW | Original post link

From the execution plan, it can be seen that scanning the table in TiKV did not take much time; the time was mainly spent on executing HashJoin in TiDB.

| username: TiDBer_ly0m6BIW | Original post link

The structures of the two tables are inconsistent, and the indexes are also different.

| username: residentevil | Original post link

You can suggest checking the execution plan of the subquery SQL first. Also, I noticed that the SQL filter conditions you posted are all fuzzy queries with % at both ends. I have tested this myself, and in terms of performance, there is not much difference between TiDB and native MySQL.

| username: TiDBer_ly0m6BIW | Original post link

The cause of the issue has been identified as being due to the left outer semi join operator. The optimization plan is detailed in The Impact of Null-Aware Issues on the TiDB Optimizer (OOM).

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.