Questions about the Subquery Execution Process

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 子查询执行过程疑问

| username: Raymond

[TiDB Usage Environment] Test
[TiDB Version] 6.5.3
Hello teachers, I have a question about subqueries.
The statement and execution plan are as follows, and the data is from the tpch dataset.

mysql> explain analyze select * from customer where C_CUSTKEY in (select O_CUSTKEY from orders where O_TOTALPRICE >30);
+----------------------------------+-------------+----------+-----------+------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                               | estRows     | actRows  | task      | access object                                  | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | operator info                                                                                                                                                      | memory   | disk |
+----------------------------------+-------------+----------+-----------+------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| IndexHashJoin_16                 | 2400000.00  | 1999956  | root      |                                                | time:5m52s, loops:1957, inner:{total:29m12.3s, concurrency:5, task:125, construct:1.96s, fetch:28m59.7s, build:311.8ms, join:10.7s}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | semi join, inner:IndexLookUp_13, outer key:tpch.customer.c_custkey, inner key:tpch.orders.o_custkey, equal cond:eq(tpch.customer.c_custkey, tpch.orders.o_custkey) | 142.2 MB | N/A  |
| ├─TableReader_33(Build)          | 3000000.00  | 3000000  | root      |                                                | time:958.5ms, loops:2941, cop_task: {num: 116, max: 3.51s, min: 1.99ms, avg: 402.5ms, p95: 1.07s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 27.6s, tot_wait: 8.76s, rpc_num: 116, rpc_time: 46.7s, copr_cache: disabled, distsql_concurrency: 15}                                                                                                                                                                                                                                                                                                                                                                                                                                                    | data:TableFullScan_32                                                                                                                                              | 90.2 MB  | N/A  |
| │ └─TableFullScan_32             | 3000000.00  | 3000000  | cop[tikv] | table:customer                                 | tikv_task:{proc max:3.22s, min:0s, avg: 219.1ms, p80:359ms, p95:785ms, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 1.56s, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 1731, read_count: 8688, read_byte: 194.0 MB, read_time: 8.81s}}}                                                                                                                                                                                                                                                                                                                                                     | keep order:false                                                                                                                                                   | N/A      | N/A  |
| └─IndexLookUp_13(Probe)          | 29955968.00 | 29955968 | root      |                                                | time:28m56.7s, loops:29442, index_task: {total_time: 22m42.1s, fetch_handle: 17m4s, build: 3.35ms, wait: 5m38.1s}, table_task: {total_time: 1h29m2.8s, num: 1899, concurrency: 5}, next: {wait_index: 2m6.3s, wait_table_lookup_build: 977.4ms, wait_table_lookup_resp: 26m47.3s}                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                    | 110.8 KB | N/A  |
|   ├─IndexRangeScan_10(Build)     | 29955968.00 | 29955968 | cop[tikv] | table:orders, index:index_o_custkey(O_CUSTKEY) | time:17m3s, loops:29933, cop_task: {num: 2628, max: 5.1s, min: 2.2ms, avg: 595.4ms, p95: 2.22s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 19m45.2s, tot_wait: 4m12.9s, rpc_num: 2628, rpc_time: 26m4.7s, copr_cache: disabled, distsql_concurrency: 15}, tikv_task:{proc max:4.16s, min:0s, avg: 451.1ms, p80:734ms, p95:2.02s, iters:39517, tasks:2628}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 32958320, get_snapshot_time: 5.43s, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 12009292, read_count: 19647, read_byte: 235.1 MB, read_time: 3m41.6s}}}                                                                   | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:false                                                                           | N/A      | N/A  |
|   └─Selection_12(Probe)          | 29955968.00 | 29955968 | cop[tikv] |                                                | time:1h28m47.9s, loops:31426, cop_task: {num: 96848, max: 20.4s, min: 399.6µs, avg: 579.6ms, p95: 2.78s, max_proc_keys: 656, p95_proc_keys: 429, tot_proc: 12h22m51.3s, tot_wait: 2h42m30.7s, rpc_num: 96914, rpc_time: 15h36m8.7s, copr_cache: disabled, distsql_concurrency: 15}, backoff{regionMiss: 114ms, regionScheduling: 2ms}, tikv_task:{proc max:19.6s, min:0s, avg: 460.6ms, p80:557ms, p95:2.54s, iters:338860, tasks:96848}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29972508, get_snapshot_time: 13s, rocksdb: {key_skipped_count: 33090, block: {cache_hit_count: 115911778, read_count: 506722, read_byte: 8.53 GB, read_time: 14m51.1s}}}  | gt(tpch.orders.o_totalprice, 30)                                                                                                                                   | N/A      | N/A  |
|     └─TableRowIDScan_11          | 29955968.00 | 29955968 | cop[tikv] | table:orders                                   | tikv_task:{proc max:19.6s, min:0s, avg: 460.5ms, p80:557ms, p95:2.54s, iters:338860, tasks:96848}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | keep order:false                                                                                                                                                   | N/A      | N/A  |
+----------------------------------+-------------+----------+-----------+------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
7 rows in set (5 min 51.98 sec)
mysql> select count(*) from orders;
+----------+
| count(*) |
+----------+
| 29955968 |
+----------+
1 row in set (6.62 sec)
mysql> select count(*) from customer;
+----------+
| count(*) |
+----------+
|  3000000 |
+----------+
1 row in set (1.18 sec)

I have already set tidb_opt_insubq_to_join_and_agg to off, disabling the function of converting subqueries into join and aggregation (I considered scenarios where the data volume after aggregation and deduplication in the subquery is particularly large) 子查询相关的优化 | PingCAP 文档中心

In such a semi-join, I think it should only take one row of data from the customer table to match with the orders table’s O_CUSTKEY. If it matches and O_TOTALPRICE is greater than 30, it can be returned to the result set, so redundant duplicate data does not need to be matched (customer and orders have a one-to-many relationship). From the execution plan, every row of data in orders is matched. Can this be further optimized to a firstmatch row match, i.e., take one row of data from customer to match with orders, and if it matches successfully once, immediately return this row of data from customer, and then take the next row of data from customer?

| username: tidb菜鸟一只 | Original post link

What you’re talking about is equivalent to a nestloop join, which is only chosen when the driving table has fewer than 10,000 rows. With your data volume, you can only choose a hash join, right?

| username: Raymond | Original post link

I am talking about this form of optimization, and I am not sure if my understanding is correct.
https://dev.mysql.com/doc/refman/8.0/en/semijoins.html

| username: 人如其名 | Original post link

Refer to the post I wrote earlier: 子查询重复值多的情况下半连接执行太慢 - TiDB 的问答社区, it should be the same BUG, and there is a fix issue in it.