IndexJoin cannot be used when the ON equality condition of a JOIN contains string functions

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 当JOIN的ON等值条件包含字符串函数时不能走IndexJoin

| username: 人如其名

[Problem Scenario Involved in the Requirement] select * from a,b where a.id=b.id (using index) and substr(a.name,1,10)=substr(b.name,1,10), unable to use IndexJoin with specified hint

[Expected Requirement Behavior] Ability to force IndexJoin

Table structure:

mysql> show create table customer \G
*************************** 1. row ***************************
       Table: customer
Create Table: CREATE TABLE `customer` (
  `C_CUSTKEY` bigint(20) NOT NULL,
  `C_NAME` varchar(25) NOT NULL,
  `C_ADDRESS` varchar(40) NOT NULL,
  `C_NATIONKEY` bigint(20) NOT NULL,
  `C_PHONE` char(15) NOT NULL,
  `C_ACCTBAL` decimal(15,2) NOT NULL,
  `C_MKTSEGMENT` char(10) NOT NULL,
  `C_COMMENT` varchar(117) NOT NULL,
  PRIMARY KEY (`C_CUSTKEY`) /*T![clustered_index] CLUSTERED */,
  KEY `idx1` (`C_PHONE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

mysql> show create table orders \G
*************************** 1. row ***************************
       Table: orders
Create Table: CREATE TABLE `orders` (
  `O_ORDERKEY` bigint(20) NOT NULL,
  `O_CUSTKEY` bigint(20) NOT NULL,
  `O_ORDERSTATUS` char(1) NOT NULL,
  `O_TOTALPRICE` decimal(15,2) NOT NULL,
  `O_ORDERDATE` date NOT NULL,
  `O_ORDERPRIORITY` char(15) NOT NULL,
  `O_CLERK` char(15) NOT NULL,
  `O_SHIPPRIORITY` bigint(20) NOT NULL,
  `O_COMMENT` varchar(79) NOT NULL,
  PRIMARY KEY (`O_ORDERKEY`) /*T![clustered_index] CLUSTERED */,
  KEY `idx1` (`O_CUSTKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

Statement: explain analyze select /*+ INL_JOIN(a,b) / count() from customer a,orders b where a.c_phone=‘25-989-741-2988’ and a.c_custkey=b.o_custkey and a.C_COMMENT=b.O_COMMENT;
b.o_custkey is an indexed field, b.o_comment is not an indexed field, IndexJoin algorithm can be used normally.

mysql> explain analyze select /*+ INL_JOIN(a,b) */ count(*) from customer a,orders b where a.c_phone='25-989-741-2988' and a.c_custkey=b.o_custkey and a.C_COMMENT=b.O_COMMENT;
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                                   | estRows | actRows | task      | access object                  | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | operator info                                                                                                                                                                                                                 | memory  | disk |
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| StreamAgg_10                         | 1.00    | 1       | root      |                                | time:16.8ms, loops:2, RU:2.519252                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | funcs:count(1)->Column#18                                                                                                                                                                                                     | 8 Bytes | N/A  |
| └─IndexJoin_43                       | 1.04    | 0       | root      |                                | time:16.8ms, loops:1, inner:{total:14.9ms, concurrency:5, task:1, construct:11.9ms, fetch:2.97ms, build:28.2µs}, probe:2.48µs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | inner join, inner:IndexLookUp_42, outer key:tpch1.customer.c_custkey, inner key:tpch1.orders.o_custkey, equal cond:eq(tpch1.customer.c_comment, tpch1.orders.o_comment), eq(tpch1.customer.c_custkey, tpch1.orders.o_custkey) | 51.1 KB | N/A  |
|   ├─IndexLookUp_37(Build)            | 1.04    | 1       | root      |                                | time:1.56ms, loops:3, index_task: {total_time: 750.8µs, fetch_handle: 748.9µs, build: 728ns, wait: 1.12µs}, table_task: {total_time: 708.2µs, num: 1, concurrency: 5}, next: {wait_index: 889.4µs, wait_table_lookup_build: 0s, wait_table_lookup_resp: 648.9µs}                                                                                                                                                                                                                                                                                                                                                    |                                                                                                                                                                                                                               | 2.13 KB | N/A  |
|   │ ├─IndexRangeScan_35(Build)       | 1.04    | 1       | cop[tikv] | table:a, index:idx1(C_PHONE)   | time:747.4µs, loops:3, cop_task: {num: 1, max: 668.4µs, proc_keys: 1, tot_proc: 92.2µs, tot_wait: 93.2µs, rpc_num: 1, rpc_time: 646.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 17.9µs, max_distsql_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 55, total_keys: 2, get_snapshot_time: 24.2µs, rocksdb: {key_skipped_count: 1, block: {cache_hit_count: 2}}}                                                                                                                                                                              | range:["25-989-741-2988","25-989-741-2988"], keep order:false                                                                                                                                                                 | N/A     | N/A  |
|   │ └─TableRowIDScan_36(Probe)       | 1.04    | 1       | cop[tikv] | table:a                        | time:630.3µs, loops:2, cop_task: {num: 1, max: 548µs, proc_keys: 1, tot_proc: 57µs, tot_wait: 71µs, rpc_num: 1, rpc_time: 533.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 24µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 184, total_keys: 1, get_snapshot_time: 14.4µs, rocksdb: {block: {cache_hit_count: 1}}}                                                                                                                                                                                 | keep order:false                                                                                                                                                                                                              | N/A     | N/A  |
|   └─IndexLookUp_42(Probe)            | 1.04    | 6       | root      |                                | time:2.81ms, loops:2, index_task: {total_time: 834.1µs, fetch_handle: 830.6µs, build: 804ns, wait: 2.64µs}, table_task: {total_time: 1.05ms, num: 1, concurrency: 5}, next: {wait_index: 1.71ms, wait_table_lookup_build: 98.5µs, wait_table_lookup_resp: 961.5µs}                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                               | 9.34 KB | N/A  |
|     ├─IndexRangeScan_40(Build)       | 1.04    | 6       | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | time:827.2µs, loops:3, cop_task: {num: 2, max: 713.5µs, min: 654µs, avg: 683.8µs, p95: 713.5µs, max_proc_keys: 6, p95_proc_keys: 6, tot_proc: 186µs, tot_wait: 255.8µs, rpc_num: 2, rpc_time: 1.33ms, copr_cache_hit_ratio: 0.00, build_task_duration: 814.7µs, max_distsql_concurrency: 2}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:2, tasks:2}, scan_detail: {total_process_keys: 6, total_process_keys_size: 276, total_keys: 8, get_snapshot_time: 86.8µs, rocksdb: {key_skipped_count: 6, block: {cache_hit_count: 1, read_count: 1, read_byte: 3.81 KB, read_time: 12.4µs}}}           | range: decided by [eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)], keep order:false                                                                                                                                    | N/A     | N/A  |
|     └─TableRowIDScan_41(Probe)       | 1.04    | 6       | cop[tikv] | table:b                        | time:944.2µs, loops:2, cop_task: {num: 3, max: 825.8µs, min: 0s, avg: 275.3µs, p95: 825.8µs, max_proc_keys: 3, p95_proc_keys: 3, tot_proc: 190.9µs, tot_wait: 280.1µs, rpc_num: 1, rpc_time: 810.6µs, copr_cache_hit_ratio: 0.00, build_task_duration: 29.8µs, max_distsql_concurrency: 1, max_extra_concurrency: 1, store_batch_num: 2}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:3, tasks:3}, scan_detail: {total_process_keys: 6, total_process_keys_size: 929, total_keys: 6, get_snapshot_time: 39.6µs, rocksdb: {block: {cache_hit_count: 6}}}                                          | keep order:false                                                                                                                                                                                                              | N/A     | N/A  |
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
8 rows in set (0.02 sec)

When a function is added to the join condition field a.C_COMMENT=b.O_COMMENT (b.O_COMMENT), IndexJoin is not used:

mysql> explain analyze select /*+ INL_JOIN(a,b) */ count(*) from customer a,orders b where a.c_phone='25-989-741-2988' and a.c_custkey=b.o_custkey and substr(a.C_COMMENT,1,10)=substr(b.O_COMMENT,1,10);
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
| id                                     | estRows    | actRows | task      | access object                | execution info                                                                                                                                                                                                                                                                                                                                                                                                                               | operator info                                                                                     | memory   | disk    |
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
| StreamAgg_13                           | 1.00       | 1       | root      |                              | time:345ms, loops:2, RU:2322.550974                                                                                                                                                                                                                                                                                                                                                                                                          | funcs:count(1)->Column#18                                                                         | 8 Bytes  | N/A     |
| └─HashJoin_27                          | 1.04       | 0       | root      |                              | time:345ms, loops:1, build_hash_table:{total:3.87ms, fetch:3.85ms, build:13.6µs}, probe:{concurrency:5, total:1.72s, max:344.8ms, probe:165.2ms, fetch:1.56s}                                                                                                                                                                                                                                                                                | inner join, equal:[eq(tpch1.customer.c_custkey, tpch1.orders.o_custkey) eq(Column#19, Column#20)] | 17.8 KB  | 0 Bytes |
|   ├─Projection_16(Build)               | 1.04       | 1       | root      |                              | time:3.84ms, loops:2, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                                                                                        | tpch1.customer.c_custkey, substr(tpch1.customer.c_comment, 1, 10)->Column#19                      | 17.5 KB  | N/A     |
|   │ └─IndexLookUp_22                   | 1.04       | 1       | root      |                              | time:3.81ms, loops:2, index_task: {total_time: 1.06ms, fetch_handle: 1.05ms, build: 832ns, wait: 1.3µs}, table_task: {total_time: 2.55ms, num: 1, concurrency: 5}, next: {wait_index: 1.33ms, wait_table_lookup_build: 0s, wait_table_lookup_resp: 2.47ms}                                                                                                                                                                                   |                                                                                                   | 34.2 KB  | N/A     |
|   │   ├─IndexRangeScan_20(Build)       | 1.04       | 1       | cop[tikv] | table:a, index:idx1(C_PHONE) | time:1.05ms, loops:3, cop_task: {num: 1, max: 968.3µs, proc_keys: 1, tot_proc: 40.9µs, tot_wait: 398.6µs, rpc_num: 1, rpc_time: 955.2µs, copr_cache_hit_ratio: 0.00, build_task_duration: 11.9µs, max_distsql_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 55, total_keys: 2, get_snapshot_time: 11.1µs, rocksdb: {key_skipped_count: 1, block: {cache_hit_count: 2}}}       | range:["25-989-741-2988","25-989-741-2988"], keep order:false                                     | N/A      | N/A     |
|   │   └─TableRowIDScan_21(Probe)       | 1.04       | 1       | cop[tikv] | table:a                      | time:2.46ms, loops:2, cop_task: {num: 1, max: 2.34ms, proc_keys: 1, tot_proc: 55.4µs, tot_wait: 88.8µs, rpc_num: 1, rpc_time: 2.33ms, copr_cache_hit_ratio: 0.00, build_task_duration: 32.7µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 184, total_keys: 1, get_snapshot_time: 19.3µs, rocksdb: {block: {cache_hit_count: 1}}}     | keep order:false                                                                                  | N/A      | N/A     |
|   └─Projection_23(Probe)               | 1500000.00 | 1500000 | root      |                              | time:331.8ms, loops:1471, Concurrency:5                                                                                                                                                                                                                                                                                                                                                                                                      | tpch1.orders.o_custkey, substr(tpch1.orders.o_comment, 1, 10)->Column#20                          | 495.7 KB | N/A     |
|     └─TableReader_25                   | 1500000.00 | 1500000 | root      |                              | time:92.3ms, loops:1471, cop_task: {num: 69, max: 94.7ms, min: 231.5µs, avg: 16.5ms, p95: 63ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 673.7ms, tot_wait: 23.6ms, rpc_num: 69, rpc_time: 1.13s, copr_cache_hit_ratio: 0.58, build_task_duration: 12µs, max_distsql_concurrency: 5}                                                                                                                                            | data:TableFullScan_24                                                                             | 15.4 MB  | N/A     |
|       └─TableFullScan_24               | 1500000.00 | 1500000 | cop[tikv] | table:b                      | tikv_task:{proc max:294ms, min:0s, avg: 61.3ms, p80:89ms, p95:259ms, iters:1737, tasks:69}, scan_detail: {total_process_keys: 893024, total_process_keys_size: 135281341, total_keys: 893053, get_snapshot_time: 1.02ms, rocksdb: {key_skipped_count: 893024, block: {cache_hit_count: 4574}}}                                                                                                                                               | keep order:false                                                                                  | N/A      | N/A     |
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
9 rows in set, 2 warnings (0.35 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------+
| Level   | Code | Message                                                                        |
+---------+------+--------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable |
| Warning | 1815 | Optimizer Hint /*+ IN
| username: 人如其名 | Original post link

Simplified Test:

-- Table structure:
create table a(id int, name varchar(100));
create table b(id int, name varchar(100));
alter table b add index(id);
insert into a values (1, '1'), (1, '11'), (2, '2');
insert into b values (1, '1'), (1, '22'), (2, '2');
-- Collect statistics
analyze table a;
analyze table b;

Supports IndexJoin
explain select /*+ INL_JOIN(a,b) / * from a, b where a.id = b.id and a.name = b.name;
explain select /
+ INL_JOIN(a,b) / * from a, b where a.id = b.id and a.name < substr(b.name, 1, 10);
explain select /
+ INL_JOIN(a,b) / * from a, b where a.id = b.id and a.name != substr(b.name, 1, 10);
Does not support IndexJoin
explain select /
+ INL_JOIN(a,b) / * from a, b where a.id = b.id and a.name = substr(b.name, 1, 10);
explain select /
+ INL_JOIN(a,b) */ * from a, b where a.id < b.id;

Strangely, a.name < substr(b.name, 1, 10) supports IndexJoin, but equality does not.
The specific execution plans are as follows:

-- Cases supporting IndexJoin
mysql> explain select /*+ INL_JOIN(a,b) */ * from a, b where a.id = b.id and a.name = b.name;
+------------------------------+---------+-----------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows | task      | access object         | operator info                                                                                                                                 |
+------------------------------+---------+-----------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| IndexJoin_13                 | 6.00    | root      |                       | inner join, inner:IndexLookUp_12, outer key:test.a.id, inner key:test.b.id, equal cond:eq(test.a.id, test.b.id), eq(test.a.name, test.b.name) |
| ├─TableReader_25(Build)      | 6.00    | root      |                       | data:Selection_24                                                                                                                             |
| │ └─Selection_24             | 6.00    | cop[tikv] |                       | not(isnull(test.a.id)), not(isnull(test.a.name))                                                                                              |
| │   └─TableFullScan_23       | 6.00    | cop[tikv] | table:a               | keep order:false                                                                                                                              |
| └─IndexLookUp_12(Probe)      | 6.00    | root      |                       |                                                                                                                                               |
|   ├─Selection_10(Build)      | 6.00    | cop[tikv] |                       | not(isnull(test.b.id))                                                                                                                        |
|   │ └─IndexRangeScan_8       | 6.00    | cop[tikv] | table:b, index:id(id) | range: decided by [eq(test.b.id, test.a.id)], keep order:false                                                                                |
|   └─Selection_11(Probe)      | 6.00    | cop[tikv] |                       | not(isnull(test.b.name))                                                                                                                      |
|     └─TableRowIDScan_9       | 6.00    | cop[tikv] | table:b               | keep order:false                                                                                                                              |
+------------------------------+---------+-----------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.06 sec)

mysql> explain select /*+ INL_JOIN(a,b) */ * from a, b where a.id = b.id and a.name < substr(b.name, 1, 10);
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                              | estRows | task      | access object         | operator info                                                                                                                                                           |
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IndexJoin_12                    | 9.00    | root      |                       | inner join, inner:IndexLookUp_11, outer key:test.a.id, inner key:test.b.id, equal cond:eq(test.a.id, test.b.id), other cond:lt(test.a.name, substr(test.b.name, 1, 10)) |
| ├─TableReader_24(Build)         | 6.00    | root      |                       | data:Selection_23                                                                                                                                                       |
| │ └─Selection_23                | 6.00    | cop[tikv] |                       | not(isnull(test.a.id))                                                                                                                                                  |
| │   └─TableFullScan_22          | 6.00    | cop[tikv] | table:a               | keep order:false                                                                                                                                                        |
| └─IndexLookUp_11(Probe)         | 9.00    | root      |                       |                                                                                                                                                                         |
|   ├─Selection_10(Build)         | 9.00    | cop[tikv] |                       | not(isnull(test.b.id))                                                                                                                                                  |
|   │ └─IndexRangeScan_8          | 9.00    | cop[tikv] | table:b, index:id(id) | range: decided by [eq(test.b.id, test.a.id)], keep order:false                                                                                                          |
|   └─TableRowIDScan_9(Probe)     | 9.00    | cop[tikv] | table:b               | keep order:false                                                                                                                                                        |
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.01 sec)

mysql> explain select /*+ INL_JOIN(a,b) */ * from a, b where a.id = b.id and a.name != substr(b.name, 1, 10);
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                              | estRows | task      | access object         | operator info                                                                                                                                                           |
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IndexJoin_12                    | 9.00    | root      |                       | inner join, inner:IndexLookUp_11, outer key:test.a.id, inner key:test.b.id, equal cond:eq(test.a.id, test.b.id), other cond:ne(test.a.name, substr(test.b.name, 1, 10)) |
| ├─TableReader_24(Build)         | 6.00    | root      |                       | data:Selection_23                                                                                                                                                       |
| │ └─Selection_23                | 6.00    | cop[tikv] |                       | not(isnull(test.a.id))                                                                                                                                                  |
| │   └─TableFullScan_22          | 6.00    | cop[tikv] | table:a               | keep order:false                                                                                                                                                        |
| └─IndexLookUp_11(Probe)         | 9.00    | root      |                       |                                                                                                                                                                         |
|   ├─Selection_10(Build)         | 9.00    | cop[tikv] |                       | not(isnull(test.b.id))                                                                                                                                                  |
|   │ └─IndexRangeScan_8          | 9.00    | cop[tikv] | table:b, index:id(id) | range: decided by [eq(test.b.id, test.a.id)], keep order:false                                                                                                          |
|   └─TableRowIDScan_9(Probe)     | 9.00    | cop[tikv] | table:b               | keep order:false                                                                                                                                                        |
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

-- Cases not supporting IndexJoin

mysql> explain select /*+ INL_JOIN(a,b) */ * from a, b where a.id = b.id and a.name = substr(b.name, 1, 10);
+------------------------------+---------+-----------+---------------+------------------------------------------------------------------------+
| id                           | estRows | task      | access object | operator info                                                          |
+------------------------------+---------+-----------+---------------+------------------------------------------------------------------------+
| HashJoin_12                  | 6.00    | root      |               | inner join, equal:[eq(test.a.id, test.b.id) eq(test.a.name, Column#7)] |
| ├─TableReader_15(Build)      | 6.00    | root      |               | data:Selection_14                                                      |
| │ └─Selection_14             | 6.00    | cop[tikv] |               | not(isnull(test.a.id))                                                 |
| │   └─TableFullScan_13       | 6.00    | cop[tikv] | table:a       | keep order:false                                                       |
| └─Projection_16(Probe)       | 6.00    | root      |               | test.b.id, test.b.name, substr(test.b.name, 1, 10)->Column#7           |
|   └─TableReader_19           | 6.00    | root      |               | data:Selection_18                                                      |
|     └─Selection_18           | 6.00    | cop[tikv] |               | not(isnull(test.b.id))                                                 |
|       └─TableFullScan_17     | 6.00    | cop[tikv] | table:b       | keep order:false                                                       |
+------------------------------+---------+-----------+---------------+------------------------------------------------------------------------+
8 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------+
| Level   | Code | Message                                                                        |
+---------+------+--------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable |
+---------+------+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain select /*+ INL_JOIN(a,b) */ * from a, b where a.id < b.id;
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------+
| id                           | estRows | task      | access object | operator info                                             |
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------+
| HashJoin_8                   | 36.00   | root      |               | CARTESIAN inner join, other cond:lt(test.a.id, test.b.id) |
| ├─TableReader_15(Build)      | 6.00    | root      |               | data:Selection_14                                         |
| │ └─Selection_14             | 6.00    | cop[tikv] |               | not(isnull(test.b.id))                                    |
| │   └─TableFullScan_13       | 6.00    | cop[tikv] | table:b       | keep order:false                                          |
| └─TableReader_12(Probe)      | 6.00    | root      |               | data:Selection_11                                         |
|   └─Selection_11             | 6.00    | cop[tikv] |               | not(isnull(test.a.id))                                    |
|     └─TableFullScan_10       | 6.00    | cop[tikv] | table:a       | keep order:false                                          |
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------+
7 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                          |
+---------+------+------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable without column equal ON condition |
+---------+------+------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
| username: 人如其名 | Original post link

Is it possible to support a hint syntax to prohibit SQL rewriting? For example, /*+ no_rewrite */.
When unable to use IndexJoin, I want to first form a subquery, then filter the function fields through association, rewriting as follows:

select * from (select /*+ USE_NL(b,a) */ a.id a_id, a.name a_name, b.id b_id, b.name b_name from a, b where a.id=b.id) c where c.a_name=substr(c.b_name,1,10);

However, the optimizer always pushes down the conditions. If I could use /*+ no_rewrite */ to prohibit SQL rewriting, it might achieve the result I want:

select /*+ no_rewrite */ * from (select /*+ INL_JOIN(a,b) */ a.id a_id, a.name a_name, b.id b_id, b.name b_name from a, b where a.id=b.id) c where c.a_name=substr(c.b_name,1,10);

Can we support a hint to prohibit SQL rewriting?

| username: TiDBer_oHSwKxOH | Original post link

No database can use an index for a query like “select * from a,b where a.id=b.id (uses index) and substr(a.name,1,10)=substr(b.name,1,10)”.

| username: cassblanca | Original post link

In TiDB, IndexJoin has some limitations. The official recommendation is to use the Index Join algorithm when the expected number of rows to be joined is relatively small (generally less than 10,000 rows). The Index Join algorithm consumes less memory, but if a large number of probe operations are required, its running speed may be slower than other join algorithms. If the data volume on the Build side is smaller than that on the Probe side, and the data on the Probe side has already been indexed, then in this case, the Index Join algorithm is more efficient.

| username: 人如其名 | Original post link

Can you provide an example where it doesn’t work???

I can give two examples where it does work: MySQL and OceanBase Community Edition.

MySQL:

-- Table structure:
create table a(id int, name varchar(100));
create table b(id int, name varchar(100));
alter table b add index(id);
insert into a values (1, '1'), (1, '11'), (2, '2');
insert into b values (1, '1'), (1, '22'), (2, '2');
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.30    |
+-----------+
1 row in set (0.00 sec)

-- It doesn't work in TiDB, but it works in MySQL:
mysql> explain analyze select /*+ NO_BNL(a, b) */ * from a, b where a.id = b.id and a.name = substr(b.name, 1, 10);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join  (cost=2.12 rows=4) (actual time=0.036..0.046 rows=2 loops=1)
    -> Filter: (a.id is not null)  (cost=0.55 rows=3) (actual time=0.019..0.021 rows=3 loops=1)
        -> Table scan on a  (cost=0.55 rows=3) (actual time=0.018..0.020 rows=3 loops=1)
    -> Filter: (a.`name` = substr(b.`name`, 1, 10))  (cost=0.42 rows=2) (actual time=0.006..0.008 rows=1 loops=3)
        -> Index lookup on b using id (id=a.id)  (cost=0.42 rows=2) (actual time=0.005..0.006 rows=2 loops=3)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

OceanBase:

-- Table structure:
create table a(id int, name varchar(100));
create table b(id int, name varchar(100));
alter table b add index(id);
insert into a values (1, '1'), (1, '11'), (2, '2');
insert into b values (1, '1'), (1, '22'), (2, '2');
obclient [test]> select version();
+------------------------------+
| version()                    |
+------------------------------+
| 5.7.25-OceanBase_CE-v4.2.0.0 |
+------------------------------+
1 row in set (0.001 sec)

-- It doesn't work in TiDB, but it works in OceanBase:
obclient [test]> explain extended pretty_color select /*+ USE_NL(a, b) leading(a, b) */ * from a, b where a.id = b.id and a.name = substr(b.name, 1, 10);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ===============================================================                                                                                                  |
| |ID|OPERATOR                      |NAME |EST.ROWS|EST.TIME(us)|                                                                                                  |
| ---------------------------------------------------------------                                                                                                  |
| |0 |NESTED-LOOP JOIN              |     |2       |71          |                                                                                                  |
| |1 |├─TABLE FULL SCAN             |a    |3       |4           |                                                                                         |
| |2 |└─DISTRIBUTED TABLE RANGE SCAN|b(id)|1       |22          |                                                                                         |
| ===============================================================                                                                                                  |
| Outputs & filters:                                                                                                                                               |
| -------------------------------------                                                                                                                            |
|   0 - output([a.id(0x7f4b8ce40f50)], [a.name(0x7f4b8ce431e0)], [b.id(0x7f4b8ce41230)], [b.name(0x7f4b8ce434c0)]), filter(nil), rowset=256                        |
|       conds(nil), nl_params_([a.id(0x7f4b8ce40f50)(:0)], [a.name(0x7f4b8ce431e0)(:1)]), use_batch=true                                                           |
|   1 - output([a.id(0x7f4b8ce40f50)], [a.name(0x7f4b8ce431e0)]), filter(nil), rowset=256                                                                          |
|       access([a.id(0x7f4b8ce40f50)], [a.name(0x7f4b8ce431e0)]), partitions(p0)                                                                                   |
|       is_index_back=false, is_global_index=false,                                                                                                                |
|       range_key([a.__pk_increment(0x7f4b8ce44470)]), range(MIN ; MAX)always true                                                                                 |
|   2 - output([b.id(0x7f4b8ce41230)], [b.name(0x7f4b8ce434c0)]), filter([:1 = substr(b.name(0x7f4b8ce434c0), 1, 10)(0x7f4b8ce41a90)(0x7f4b8cec23a0)]), rowset=256 |
|       access([GROUP_ID(0x7f4b8cef0150)], [b.__pk_increment(0x7f4b8ce44740)], [b.id(0x7f4b8ce41230)], [b.name(0x7f4b8ce434c0)]), partitions(p0)                   |
|       is_index_back=true, is_global_index=false, filter_before_indexback[false],                                                                                 |
|       range_key([b.id(0x7f4b8ce41230)], [b.__pk_increment(0x7f4b8ce44740)]), range(MIN ; MAX),                                                                   |
|       range_cond([:0 = b.id(0x7f4b8ce41230)(0x7f4b8cec18b0)])                                                                                                    |
| Used Hint:                                                                                                                                                       |
| -------------------------------------                                                                                                                            |
|   /*+                                                                                                                                                            |
|                                                                                                                                                                  |
|       LEADING(("a" "b"))                                                                                                                                         |
|       USE_NL("b")                                                                                                                                                |
|   */                                                                                                                                                             |
| Qb name trace:                                                                                                                                                   |
| -------------------------------------                                                                                                                            |
|   stmt_id:0, stmt_type:T_EXPLAIN                                                                                                                                 |
|   stmt_id:1, SEL$1                                                                                                                                               |
| Outline Data:                                                                                                                                                    |
| -------------------------------------                                                                                                                            |
|   /*+                                                                                                                                                            |
|       BEGIN_OUTLINE_DATA                                                                                                                                         |
|       LEADING(@"SEL$1" ("test"."a"@"SEL$1" "test"."b"@"SEL$1"))                                                                                                  |
|       USE_NL(@"SEL$1" "test"."b"@"SEL$1")                                                                                                                        |
|       FULL(@"SEL$1" "test"."a"@"SEL$1")                                                                                                                          |
|       INDEX(@"SEL$1" "test"."b"@"SEL$1" "id")                                                                                                                    |
|       USE_DAS(@"SEL$1" "test"."b"@"SEL$1")                                                                                                                       |
|       OPTIMIZER_FEATURES_ENABLE('4.0.0.0')                                                                                                                       |
|       END_OUTLINE_DATA                                                                                                                                           |
|   */                                                                                                                                                             |
| Optimization Info:                                                                                                                                               |
| -------------------------------------                                                                                                                            |
|   a:                                                                                                                                                             |
|       table_rows:3                                                                                                                                               |
|       physical_range_rows:3                                                                                                                                      |
|       logical_range_rows:3                                                                                                                                       |
|       index_back_rows:0                                                                                                                                          |
|       output_rows:3                                                                                                                                              |
|       table_dop:1                                                                                                                                                |
|       dop_method:Table DOP                                                                                                                                       |
|       avaiable_index_name:[a]                                                                                                                                    |
|       stats version:0                                                                                                                                            |
|       dynamic sampling level:1                                                                                                                                   |
|   b:                                                                                                                                                             |
|       table_rows:3                                                                                                                                               |
|       physical_range_rows:1                                                                                                                                      |
|       logical_range_rows:1                                                                                                                                       |
|       index_back_rows:1                                                                                                                                          |
|       output_rows:0                                                                                                                                              |
|       table_dop:1                                                                                                                                                |
|       dop_method:DAS DOP                                                                                                                                         |
|       avaiable_index_name:[id, b]                                                                                                                                |
|       unstable_index_name:[b]                                                                                                                                    |
|       stats version:0                                                                                                                                            |
|       dynamic sampling level:1                                                                                                                                   |
|   Plan Type:                                                                                                                                                     |
|       LOCAL                                                                                                                                                      |
|   Note:                                                                                                                                                          |
|       Degree of Parallelisim is 1 because of table property                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
72 rows in set (0.004 sec)
| username: qw4990-PingCAP | Original post link

We have looked into it and found that the reason why IndexJoin cannot be used is that the optimizer currently uses all equality conditions as Join Keys, including substr(...). However, since IndexJoin does not support substr(...), if substr(...) is included in the Join Key, IndexJoin cannot be generated, and only HashJoin can be used.

A better approach would be to not include substr(...) in the Join Key but place it in the Other Key, as shown in the image below. This way, IndexJoin can be used:

We have recorded an issue and will fix this problem before the next LTS v7.5: planner: support using partial EQ predicates to construct IndexJoins · Issue #47233 · pingcap/tidb · GitHub @人如其名

| username: asd80703406 | Original post link

IndexJoin does not support substr(...), so if you put substr(...) in the Join Key, IndexJoin cannot be generated and only HashJoin can be used.
–Where did you get this information?

| username: 人如其名 | Original post link

Simply put, this is obtained from his mind. This expert is the core designer and implementer of the TiDB optimizer.

| username: asd80703406 | Original post link

:exploding_head: Alright, alright. I was thinking about diving deeper into it, but I don’t have the energy to look at the source code right now.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.