Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: join reorder没有对等值关联条件做转换

[TiDB Usage Environment] Test
[TiDB Version] v7.5.1
The execution efficiency of tpch’s Q5 is relatively slow, partly because the equivalent join condition is not converted. Post: 因优化器问题导致TPCH的Q5语句执行过慢 - TiDB 的问答社区
However, the content of the above post is extensive and does not focus on this issue, so here we simplify the test for this problem.
Create the relevant table structure:
drop table if exists a;
drop table if exists b;
drop table if exists c;
create table a(id int, c1 int, c2 int, primary key (id));
create table b(id int, c1 int, c2 int, primary key (id));
create table c(id int, c1 int, c2 int, primary key (id));
insert into a values (1,1,1), (2,2,2), (3,3,3);
insert into b values (1,1,1), (10,10,10), (11,11,11), (12,12,12);
insert into c values (1,1,1), (2,2,2);
analyze table a, b, c with 1 samplerate;
Check the execution plan of the SQL statement: select count(*) from a, b, c where a.id=b.id and b.id=c.id and c.c1=a.c1:
mysql> explain select count(*) from a, b, c where a.id=b.id and b.id=c.id and c.c1=a.c1;
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
| StreamAgg_15 | 1.00 | root | | funcs:count(1)->Column#10 |
| └─HashJoin_45 | 2.00 | root | | inner join, equal:[eq(test.a.id, test.b.id) eq(test.c.id, test.b.id)] |
| ├─HashJoin_27(Build) | 2.00 | root | | inner join, equal:[eq(test.c.c1, test.a.c1)] |
| │ ├─TableReader_30(Build) | 2.00 | root | | data:Selection_29 |
| │ │ └─Selection_29 | 2.00 | cop[tikv] | | not(isnull(test.c.c1)) |
| │ │ └─TableFullScan_28 | 2.00 | cop[tikv] | table:c | keep order:false |
| │ └─TableReader_33(Probe) | 3.00 | root | | data:Selection_32 |
| │ └─Selection_32 | 3.00 | cop[tikv] | | not(isnull(test.a.c1)) |
| │ └─TableFullScan_31 | 3.00 | cop[tikv] | table:a | keep order:false |
| └─TableReader_35(Probe) | 4.00 | root | | data:TableFullScan_34 |
| └─TableFullScan_34 | 4.00 | cop[tikv] | table:b | keep order:false |
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
11 rows in set (0.01 sec)
It can be seen that the optimizer first chooses a join c and then associates with b, but the condition for a join c is only: equal:[eq(test.c.c1, test.a.c1)]
and does not perceive that a.id=b.id and b.id=c.id imply a.id=c.id, so it does not filter a.id=c.id in advance. Manually optimized to:
mysql> explain select count(*) from a, b, c where a.id=b.id and b.id=c.id and c.c1=a.c1 and a.id=c.id;
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
| StreamAgg_15 | 1.00 | root | | funcs:count(1)->Column#10 |
| └─HashJoin_65 | 2.00 | root | | inner join, equal:[eq(test.a.id, test.b.id) eq(test.c.id, test.b.id)] |
| ├─HashJoin_47(Build) | 2.00 | root | | inner join, equal:[eq(test.c.c1, test.a.c1) eq(test.c.id, test.a.id)] |
| │ ├─TableReader_50(Build) | 2.00 | root | | data:Selection_49 |
| │ │ └─Selection_49 | 2.00 | cop[tikv] | | not(isnull(test.c.c1)) |
| │ │ └─TableFullScan_48 | 2.00 | cop[tikv] | table:c | keep order:false |
| │ └─TableReader_53(Probe) | 3.00 | root | | data:Selection_52 |
| │ └─Selection_52 | 3.00 | cop[tikv] | | not(isnull(test.a.c1)) |
| │ └─TableFullScan_51 | 3.00 | cop[tikv] | table:a | keep order:false |
| └─TableReader_55(Probe) | 4.00 | root | | data:TableFullScan_54 |
| └─TableFullScan_54 | 4.00 | cop[tikv] | table:b | keep order:false |
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
11 rows in set (0.00 sec)
In complex SQL statements, it is difficult to manually perform such transformations. It is hoped that the optimizer can optimize the equivalent conversion of related fields.