Questions about the execution plan showing hashagg and build?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 执行计划显示hashagg和build的疑问?

| username: Raymond

[TiDB Usage Environment] Test
[TiDB Version] 6.5.3
[Reproduction Path] What operations were performed to encounter the issue
[Encountered Issue: Problem Phenomenon and Impact]
Hello teachers,

While studying the execution plan of TiDB, I have the following questions. The data source is from TPC-H.

mysql> explain analyze select * from customer where C_CUSTKEY in (select O_CUSTKEY from orders limit 1);
+----------------------------------+---------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
| id                               | estRows | actRows | task      | access object                                 | execution info                                                                                                                                                                                            | operator info                                                                                                                                                           | memory    | disk    |
+----------------------------------+---------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
| IndexHashJoin_16                 | 1.00    | 1       | root      |                                               | time:2.73ms, loops:2, inner:{total:1.05ms, concurrency:5, task:1, construct:12.1µs, fetch:1.03ms, build:3.69µs, join:4.19µs}                                                                              | inner join, inner:TableReader_11, outer key:tpch2.orders.o_custkey, inner key:tpch2.customer.c_custkey, equal cond:eq(tpch2.orders.o_custkey, tpch2.customer.c_custkey) | 146.4 KB  | N/A     |
| ├─HashAgg_21(Build)              | 1.00    | 1       | root      |                                               | time:1.53ms, loops:3                                                                                                                                                                                      | group by:tpch2.orders.o_custkey, funcs:firstrow(tpch2.orders.o_custkey)->tpch2.orders.o_custkey                                                                         | 898 Bytes | 0 Bytes |
| │ └─Limit_22                     | 1.00    | 1       | root      |                                               | time:1.52ms, loops:2                                                                                                                                                                                      | offset:0, count:1                                                                                                                                                       | N/A       | N/A     |
| │   └─IndexReader_27             | 1.00    | 1       | root      |                                               | time:1.51ms, loops:1, cop_task: {num: 1, max: 1.57ms, proc_keys: 1, rpc_num: 1, rpc_time: 1.54ms, copr_cache: disabled, distsql_concurrency: 1}                                                           | index:Limit_26                                                                                                                                                          | 241 Bytes | N/A     |
| │     └─Limit_26                 | 1.00    | 1       | cop[tikv] |                                               | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 46, total_keys: 2, get_snapshot_time: 601.8µs, rocksdb: {key_skipped_count: 1, block: {cache_hit_count: 5}}}  | offset:0, count:1                                                                                                                                                       | N/A       | N/A     |
| │       └─IndexFullScan_25       | 1.00    | 1       | cop[tikv] | table:orders, index:indexO_CUSTKEY(O_CUSTKEY) | tikv_task:{time:0s, loops:1}                                                                                                                                                                              | keep order:false                                                                                                                                                        | N/A       | N/A     |
| └─TableReader_11(Probe)          | 1.00    | 1       | root      |                                               | time:951.8µs, loops:2, cop_task: {num: 1, max: 907µs, proc_keys: 1, rpc_num: 1, rpc_time: 888.3µs, copr_cache: disabled, distsql_concurrency: 15}                                                         | data:TableRangeScan_10                                                                                                                                                  | N/A       | N/A     |
|   └─TableRangeScan_10            | 1.00    | 1       | cop[tikv] | table:customer                                | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 184, total_keys: 1, get_snapshot_time: 407.4µs, rocksdb: {block: {cache_hit_count: 5}}}                       | range: decided by [tpch2.orders.o_custkey], keep order:false                                                                                                            | N/A       | N/A     |
+----------------------------------+---------+---------+-----------+-----------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
8 rows in set (0.00 sec)

I have the following two questions:

  1. For this kind of limit 1 statement, why does the hashagg operator appear?
  2. Why does the hashagg operator have a build display? Is this build related to the IndexHashJoin operator above?
mysql> explain analyze select O_CUSTKEY from orders limit 1;
+--------------------------+---------+---------+-----------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------+------+
| id                       | estRows | actRows | task      | access object                                 | execution info                                                                                                                                                                                                     | operator info          | memory  | disk |
+--------------------------+---------+---------+-----------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------+------+
| Limit_6                  | 1.00    | 1       | root      |                                               | time:1.59ms, loops:2                                                                                                                                                                                               | offset:0, count:1      | N/A     | N/A  |
| └─IndexReader_11         | 1.00    | 32      | root      |                                               | time:1.59ms, loops:1, cop_task: {num: 1, max: 1.44ms, proc_keys: 224, rpc_num: 1, rpc_time: 1.41ms, copr_cache: disabled, distsql_concurrency: 15}                                                                 | index:IndexFullScan_10 | 4.19 KB | N/A  |
|   └─IndexFullScan_10     | 1.00    | 224     | cop[tikv] | table:orders, index:indexO_CUSTKEY(O_CUSTKEY) | tikv_task:{time:0s, loops:3}, scan_detail: {total_process_keys: 224, total_process_keys_size: 10304, total_keys: 225, get_snapshot_time: 611.7µs, rocksdb: {key_skipped_count: 224, block: {cache_hit_count: 6}}}  | keep order:false       | N/A     | N/A  |
+--------------------------+---------+---------+-----------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------+------+
3 rows in set (0.00 sec)
| username: Raymond | Original post link

For example, this limit also has a build.

| username: 数据库真NB | Original post link

Is this the number of searches in a B+ tree storage?

| username: 有猫万事足 | Original post link

First of all, build and probe always appear in pairs, usually representing the two sides of a join.

The documentation provides some general introductions. If you look into the code, you will find that both build and probe start a thread to execute, so it’s hard to say which one actually starts first. At most, the build thread establishes first.

As for why there is a hashagg operator in limit1, I think this is indeed an issue with the execution plan. There should be room for optimization.

| username: Raymond | Original post link

Yes, the execution plan of TiDB does indeed appear somewhat confusing for build and probe.

| username: 小龙虾爱大龙虾 | Original post link

  1. This is equivalent only when adding group by, so the HASH AGG operator will appear. Refer to: 子查询相关的优化 | PingCAP 文档中心

  2. This is because it is the build side of the index hash join.

| username: Raymond | Original post link

  1. This is because it is the build side of the index hash join.
    Then it should be displayed after Indexhashjoin.
| username: 小龙虾爱大龙虾 | Original post link

It shouldn’t be. Simply put, table A and table B are related, with an inner table and an outer table, a driving table and a driven table, a build side and a probe side. These all refer to tables A and B, not the join algorithm chosen for the two tables.

| username: Raymond | Original post link

I misunderstood, of course, I wasn’t referring to a specific join algorithm. I had a bit of a memory lapse here. Thanks, expert, for resolving my doubts. The optimization of subqueries is also present in other databases. Thanks.

| username: TiDBer_JUi6UvZm | Original post link

Learned.

| username: Raymond | Original post link

Hello, I have a question about the optimization of subqueries according to the official website.

The subquery
select * from t1 where t1.a in (select t2.a from t2) will be rewritten as select t1.* from t1, (select distinct(a) a from t2) t2 where t1.a = t2.a

So t2 can only be the driving table? Can’t it be the driven table? I think it should be possible, I remember MySQL can.

| username: 小龙虾爱大龙虾 | Original post link

TiDB doesn’t work yet, you can rewrite it as exists.

| username: shigp_TIDBER | Original post link

Learning, learning.

| username: 不想干活 | Original post link

Learned a lot.

| username: oceanzhang | Original post link

I would like to ask what build and probe represent.

| username: Raymond | Original post link

Thank you for your reply, teacher.

| username: xiaoqiao | Original post link

Learned.

| username: 舞动梦灵 | Original post link

Got it. So far, what I understand is that apart from knowing the execution order or parallel execution of build and probe, everything else is understood in the same way as MySQL.