Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: tidb index join 强行指定连接顺序后执行时间变化的问题

Dear experts, please test the execution performance of SQL statements under the tpch database involving the customer table (b table) and orders table (a table).
mysql> select count() from customer;
±---------+
| count() |
±---------+
| 3000000 |
±---------+
1 row in set (0.41 sec)
mysql> select count() from orders;
±---------+
| count() |
±---------+
| 29955968 |
±---------+
1 row in set (4.38 sec)
- First, I executed statement 1. The join order of the index join is determined by the optimizer itself, with the b table as the outer table and the a table as the inner table. The execution time was found to be 52.99s.
mysql> explain analyze select /*+ INL_JOIN(a,b) */ a.O_ORDERKEY, a.O_CUSTKEY,b.C_NAME from orders a join customer b on a.O_CUSTKEY = b.C_CUSTKEY;
+-----------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-----------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexJoin_19 | 29955968.00 | 29955968 | root | | time:53s, loops:29255, inner:{total:4m16.6s, concurrency:5, task:126, construct:1.76s, fetch:4m11.4s, build:3.42s}, probe:4.25s | inner join, inner:IndexReader_18, outer key:tpch.customer.c_custkey, inner key:tpch.orders.o_custkey, equal cond:eq(tpch.customer.c_custkey, tpch.orders.o_custkey) | 55.6 MB | N/A |
| ├─TableReader_31(Build) | 3000000.00 | 3000000 | root | | time:93.9ms, loops:2941, cop_task: {num: 116, max: 429.3ms, min: 3.11ms, avg: 88ms, p95: 266.5ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 4.35s, tot_wait: 434ms, rpc_num: 116, rpc_time: 10.2s, copr_cache: disabled, distsql_concurrency: 15} | data:TableFullScan_30 | 13.0 MB | N/A |
| │ └─TableFullScan_30 | 3000000.00 | 3000000 | cop[tikv] | table:b | tikv_task:{proc max:278ms, min:0s, avg: 35.5ms, p80:50ms, p95:155ms, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 41.6ms, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 10583}}} | keep order:false | N/A | N/A |
| └─IndexReader_18(Probe) | 29955968.00 | 29955968 | root | | time:4m7.8s, loops:29678, cop_task: {num: 2635, max: 906.4ms, min: 2.16ms, avg: 119.1ms, p95: 436.1ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 2m55.8s, tot_wait: 22.4s, rpc_num: 2635, rpc_time: 5m13.7s, copr_cache: disabled, distsql_concurrency: 15} | index:IndexRangeScan_17 | 12.1 KB | N/A |
| └─IndexRangeScan_17 | 29955968.00 | 29955968 | cop[tikv] | table:a, index:index_o_custkey(O_CUSTKEY) | tikv_task:{proc max:723ms, min:0s, avg: 66ms, p80:104ms, p95:352ms, iters:39564, tasks:2635}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 32958329, get_snapshot_time: 403.3ms, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 16464198}}} | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:false | N/A | N/A |
+-----------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
5 rows in set (52.99 sec)
- Executed statement 2, which forcibly specified the join order, making the a table the outer table and the b table the inner table. The execution time was significantly reduced to 27.66s.
mysql> explain analyze select /*+ INL_JOIN(b) */ a.O_ORDERKEY, a.O_CUSTKEY,b.C_NAME from orders a join customer b on a.O_CUSTKEY = b.C_CUSTKEY;
+-----------------------------+-------------+----------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-----------------------------+-------------+----------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexJoin_12 | 29955968.00 | 29955968 | root | | time:27.6s, loops:29255, inner:{total:2m9.3s, concurrency:5, task:1179, construct:11.6s, fetch:1m57.3s, build:398.2ms}, probe:7.04s | inner join, inner:TableReader_9, outer key:tpch.orders.o_custkey, inner key:tpch.customer.c_custkey, equal cond:eq(tpch.orders.o_custkey, tpch.customer.c_custkey) | 15.2 MB | N/A |
| ├─IndexReader_20(Build) | 29955968.00 | 29955968 | root | | time:523.5ms, loops:29300, cop_task: {num: 784, max: 250.2ms, min: 2.26ms, avg: 81.5ms, p95: 168.1ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 21.1s, tot_wait: 866ms, rpc_num: 784, rpc_time: 1m3.9s, copr_cache: disabled, distsql_concurrency: 15} | index:IndexFullScan_19 | 12.3 MB | N/A |
| │ └─IndexFullScan_19 | 29955968.00 | 29955968 | cop[tikv] | table:a, index:index_o_custkey(O_CUSTKEY) | tikv_task:{proc max:115ms, min:0s, avg: 25.9ms, p80:39ms, p95:66ms, iters:32383, tasks:784}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 29956752, get_snapshot_time: 133.9ms, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 23760}}} | keep order:false | N/A | N/A |
| └─TableReader_9(Probe) | 29955968.00 | 2001764 | root | | time:1m56.3s, loops:3529, cop_task: {num: 5146, max: 140.6ms, min: 621.7µs, avg: 25.4ms, p95: 56.4ms, max_proc_keys: 992, p95_proc_keys: 992, tot_proc: 16.5s, tot_wait: 6.31s, rpc_num: 5146, rpc_time: 2m10.6s, copr_cache: disabled, distsql_concurrency: 15} | data:TableRangeScan_8 | N/A | N/A |
| └─TableRangeScan_8 | 29955968.00 | 2001764 | cop[tikv] | table:b | tikv_task:{proc max:75ms, min:0s, avg: 3.59ms, p80:5ms, p95:10ms, iters:17550, tasks:5146}, scan_detail: {total_process_keys: 2001764, total_process_keys_size: 407321028, total_keys: 3005092, get_snapshot_time: 129.1ms, rocksdb: {key_skipped_count: 1999798, block: {cache_hit_count: 5439074}}} | range: decided by [tpch.orders.o_custkey], keep order:false | N/A | N/A |
+-----------------------------+-------------+----------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
5 rows in set (27.66 sec)
This is very confusing. Initially, I guessed that even though the a table has a large amount of data, since the a table uses an index and does not need to go back to the table, the scanning speed is very fast despite having nearly 30 million rows, so the a table as the outer table would be faster.
So I tried the third statement:
mysql> explain analyze select /*+ INL_JOIN(b) */ a.O_ORDERKEY, a.O_CUSTKEY,a.O_ORDERDATE,b.C_NAME from orders a join customer b on a.O_CUSTKEY = b.C_CUSTKEY;
mysql> explain analyze select /*+ INL_JOIN(b) */ a.O_ORDERKEY, a.O_CUSTKEY,a.O_ORDERDATE,b.C_NAME from orders a join customer b on a.O_CUSTKEY = b.C_CUSTKEY;
+-----------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-----------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexJoin_12 | 29955968.00 | 29955968 | root | | time:2m58.7s, loops:29255, inner:{total:14m50.6s, concurrency:5, task:1180, construct:46.3s, fetch:13m59.3s, build:4.91s}, probe:18.3s | inner join, inner:TableReader_9, outer key:tpch.orders.o_custkey, inner key:tpch.customer.c_custkey, equal cond:eq(tpch.orders.o_custkey, tpch.customer.c_custkey) | 21.9 MB | N/A |
| ├─TableReader_18(Build) | 29955968.00 | 29955968 | root | | time:1.59s, loops:29317, cop_task: {num: 1025, max: 608.4ms, min: 3.23ms, avg: 158.7ms, p95: 333.4ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 1m30.4s, tot_wait: 7.71s, rpc_num: 1025, rpc_time: 2m42.6s, copr_cache: disabled, distsql_concurrency: 15} | data:TableFullScan_17 | 18.4 MB | N/A |
| │ └─TableFullScan_17 | 29955968.00 | 29955968 | cop[tikv] | table:a | tikv_task:{proc max:515ms, min:0s, avg: 84.3ms, p80:154ms, p95:228ms, iters:33322, tasks:1025}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29956993, get_snapshot_time: 1.68s, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 63232, read_count: 16425, read_byte: 282.6 MB, read_time: 11.8s}}} | keep order:false | N/A | N/A |
| └─TableReader_9(Probe) | 29955968.00 | 29744299 | root | | time:13m43.2s, loops:34209, cop_task: {num: 39435, max: 595.3ms, min: 657.8µs, avg: 77.6ms, p95: 211.6ms, max_proc_keys: 2016, p95_proc_keys: 2016, tot_proc: 21m47s, tot_wait: 8m43s, rpc_num: 39435, rpc_time: 50m57.4s, copr_cache: disabled, distsql_concurrency: 15} | data:TableRangeScan_8 | N/A | N/A |
| └─TableRangeScan_8 | 29955968.00 | 29744299 | cop[tikv] | table:b | tikv_task:{proc max:556ms, min:0s, avg: 33.4ms, p80:60ms, p95:133ms, iters:162048, tasks:39435}, scan_detail: {total_process_keys: 29744299, total_process_keys_size: 6052480951, total_keys: 29943155, get_snapshot_time: 2.1s, rocksdb: {key_skipped_count: 335362, block: {cache_hit_count: 141357867, read_count: 6, read_byte: 616.2 KB, read_time: 72.8ms}}} | range: decided by [tpch.orders.o_custkey], keep order:false | N/A | N/A |
+-----------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
5 rows in set (2 min 58.70 sec)
In the third statement, the a table could not use the index, and the total execution time reached 2 min 58.70 sec, which is very confusing. Additionally, in the probe phase of the b table, there are 29744299 rows, but the b table only has 3 million rows in total, which is hard to explain.
Seeking expert explanation:
-
Why is statement 2 much faster than statement 1?
My guess: -
The a table uses an index and does not need to go back to the table, so the scanning speed is fast.
-
In the probe phase of the index join, statement 1 needs to perform secondary index lookups on nearly 30 million rows of the a table, which is slower than scanning the entire 3 million rows of the b table.
-
In the execution plan of statement 3, the probe phase of the b table has 29744299 rows, but the b table only has 3 million rows in total. How can this be explained?