Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TiDB优化器未对关联条件中“or”进行改写
For the TPCH table, there is the following test statement (this statement has no meaning, only to examine or rewrite ability):
select count(a.S_NAME), sum(b.s_acctbal) from supplier a, supplier b where a.S_ADDRESS = b.S_ADDRESS or a.S_PHONE = b.S_PHONE;
Check its execution plan:
mysql> explain select count(a.S_NAME), sum(b.s_acctbal) from supplier a, supplier b where a.S_ADDRESS = b.S_ADDRESS or a.S_PHONE = b.S_PHONE;
+-------------------------------+----------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+----------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_10 | 1.00 | root | | funcs:count(tpch1.supplier.s_name)->Column#15, funcs:sum(tpch1.supplier.s_acctbal)->Column#16 |
| └─HashJoin_17 | 10000000000.00 | root | | CARTESIAN inner join, other cond:or(eq(tpch1.supplier.s_address, tpch1.supplier.s_address), eq(tpch1.supplier.s_phone, tpch1.supplier.s_phone)) |
| ├─TableReader_16(Build) | 100000.00 | root | | data:TableFullScan_15 |
| │ └─TableFullScan_15 | 100000.00 | cop[tikv] | table:b | keep order:false |
| └─TableReader_14(Probe) | 100000.00 | root | | data:TableFullScan_13 |
| └─TableFullScan_13 | 100000.00 | cop[tikv] | table:a | keep order:false |
+-------------------------------+----------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
mysql> explain analyze select a.l_partkey from lineitem a cross join lineitem b limit 10;
+-------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------+---------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------+---------+
| Limit_12 | 10.00 | 10 | root | | time:2m46.8s, loops:2 | offset:0, count:10 | N/A | N/A |
| └─HashJoin_13 | 10.00 | 1024 | root | | time:2m46.8s, loops:1, build_hash_table:{total:17.4s, fetch:10.5s, build:6.87s}, probe:{concurrency:3, total:8m23.6s, max:2m48.7s, probe:7m31.4s, fetch:52.1s} | CARTESIAN inner join | 1023.6 MB | 1.34 GB |
| ├─TableReader_18(Build) | 59986052.00 | 59986052 | root | | time:10.3s, loops:58813, cop_task: {num: 2197, max: 281.3ms, min: 1.1ms, avg: 31.4ms, p95: 68.4ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 56.1s, tot_wait: 9.09s, rpc_num: 2197, rpc_time: 1m9s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 5} | data:TableFullScan_17 | 1.86 MB | N/A |
| │ └─TableFullScan_17 | 59986052.00 | 59986052 | cop[tikv] | table:b | tikv_task:{proc max:271ms, min:0s, avg: 25.6ms, p80:42ms, p95:58ms, iters:67264, tasks:2197}, scan_detail: {total_process_keys: 59986052, total_process_keys_size: 2159497872, total_keys: 59988249, get_snapshot_time: 204.3ms, rocksdb: {key_skipped_count: 59986052, block: {cache_hit_count: 17464, read_count: 185633, read_byte: 3.77 GB, read_time: 15.7s}}} | keep order:false | N/A | N/A |
| └─TableReader_16(Probe) | 0.00 | 3072 | root | | time:10.2ms, loops:3, cop_task: {num: 9, max: 11.9ms, min: 956.6µs, avg: 6.62ms, p95: 11.9ms, max_proc_keys: 992, p95_proc_keys: 992, tot_proc: 22ms, tot_wait: 26ms, rpc_num: 9, rpc_time: 59.4ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 5} | data:TableFullScan_15 | 48.4 KB | N/A |
| └─TableFullScan_15 | 0.00 | 3552 | cop[tikv] | table:a | tikv_task:{proc max:9ms, min:0s, avg: 2.78ms, p80:9ms, p95:9ms, iters:32, tasks:9}, scan_detail: {total_process_keys: 3552, total_process_keys_size: 697625, total_keys: 3561, get_snapshot_time: 18.2ms, rocksdb: {key_skipped_count: 3552, block: {cache_hit_count: 71, read_count: 14, read_byte: 750.3 KB, read_time: 10.4ms}}} | keep order:false | N/A | N/A |
+-------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------+---------+
6 rows in set (2 min 48.75 sec)
It can be seen that the use of hash join + internal cross join results in a very slow hash join process. If a union rewrite can be performed, the efficiency in this scenario would be much higher. Therefore, should cost-based execution plan rewriting be enhanced to optimize such issues?