The same query becomes very slow after adding an in() clause

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 同样语句加了一行in(),速度变的非常慢

| username: Jjjjayson_zeng

【TiDB Usage Environment】Production Environment
【TiDB Version】
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Issue Phenomenon and Impact】
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】
Statements

use.sql (3.1 KB)


Previous Execution Plan.xlsx (2.9 KB)
Subsequent Execution Plan.xlsx (3.6 KB)

| username: 芮芮是产品 | Original post link

The indexes being used are also different.

| username: Jjjjayson_zeng | Original post link

Do you have any good suggestions?

| username: zhanggame1 | Original post link

First, analyze the table to see if it can be faster.

| username: TiDBer_PyN8ad50 | Original post link

  1. Check if ANALYZE TABLE is useful.
  2. Add a hint to force the index.
| username: Jjjjayson_zeng | Original post link

Error occurred

| username: Jjjjayson_zeng | Original post link

I’ll try forcing the index first.

| username: forever | Original post link

The tables in the two SQL queries are different, not just the conditions, right?

| username: 芮芮是产品 | Original post link

Please post the table structure.

| username: Jjjjayson_zeng | Original post link

Table Structure.sql (1.4 KB)

| username: Jjjjayson_zeng | Original post link

Of course, it’s the same.

| username: 芮芮是产品 | Original post link

Please post the complete SQL.

| username: Jjjjayson_zeng | Original post link

This is complete
image

| username: 小龙虾爱大龙虾 | Original post link

Previous execution plan: The execution plan only included tables a and b. Other tables were removed during logical optimization because tables c, cc, and atc were joined using left join, and the join columns had unique keys. The final SQL query columns did not include c, cc, or atc, so removing these three tables did not affect the execution result. The optimizer recognized this and directly removed these three tables.
Subsequent execution plan: Due to the added condition in act.datatype (xxx), the join with the atc table changed from a left join to an inner join, which affected the final SQL result. Therefore, the atc table could not be removed. However, I don’t quite understand why the c and cc tables were not removed.
Waiting for other experts to analyze :face_with_monocle:

| username: 江湖故人 | Original post link

The optimizer eliminated the left join from the original SQL, so the original SQL is much faster.
The following SQL statements are equivalent:

SELECT DISTINCT T1.* FROM T1 LEFT JOIN T2 ON T1.C1 = T2.C1; 
SELECT DISTINCT T1.* FROM T1;

You can try commenting out the following two lines:

           -- LEFT JOIN hengyi.ORGSTDSTRUCT c ON b.BRANCHID = c.UNITID
           -- left join hengyi.cpc_job cc on b.JOBCODE=cc.id
| username: Jjjjayson_zeng | Original post link

Now I can solve it by writing this as a subquery and then adding conditions outside, which makes it faster. But I really don’t understand the principle behind it.

| username: Jjjjayson_zeng | Original post link

Moreover, the same SQL and the same data run very fast on MySQL.

| username: Jjjjayson_zeng | Original post link

But it can’t run on TiDB because of this “in”.

| username: Jjjjayson_zeng | Original post link

I’ll give it a try.

| username: 随缘天空 | Original post link

The field shown in your execution plan indicates that your statistics are inaccurate. Try using EXPLAIN ANALYZE TABLE table_name to execute and see.