Questions about the purpose of build and probe in execution plans

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 对于执行计划中build和probe的用途的疑惑

| username: 人如其名

[Test Environment for TiDB] Testing
[TiDB Version] v7.0.0
In the overview chapter of the document execution plan, there are some descriptions of the meanings of build and probe:

The description here is not problematic, but having both build and probe in the execution plan can easily confuse people with the actual execution of operators like hash join.

  1. hash_join always selects a small table as the buildSide, and it is also on the build side in the execution plan, which is not problematic.
mysql> explain select count(*) from (select /*+ hash_join(a,b) */ * from customer a, orders b where a.c_custkey=b.o_custkey)a;
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
| id                            | estRows      | task      | access object                  | operator info                                                                |
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
| StreamAgg_10                  | 1.00         | root      |                                | funcs:count(1)->Column#18                                                    |
| └─HashJoin_60                 | 150000000.00 | root      |                                | inner join, equal:[eq(tpch100.customer.c_custkey, tpch100.orders.o_custkey)] |
|   ├─IndexReader_35(Build)     | 15000000.00  | root      |                                | index:IndexFullScan_34                                                       |
|   │ └─IndexFullScan_34        | 15000000.00  | cop[tikv] | table:a, index:idx1(C_PHONE)   | keep order:false                                                             |
|   └─IndexReader_39(Probe)     | 150000000.00 | root      |                                | index:IndexFullScan_38                                                       |
|     └─IndexFullScan_38        | 150000000.00 | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | keep order:false                                                             |
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
  1. inl_join, the outer table retrieves some records, searches for matching records in the inner table, and creates a hash_table on the result set found in the inner table as the buildSide, while the outer records perform the probe.
    TiDB 源码阅读系列文章(十一)Index Lookup Join | PingCAP
    From inner:IndexReader_40, it can be seen that IndexReader_40 is the inner table, and the build occurs during actual execution. This is exactly the opposite of what is marked in the execution plan, which can easily cause confusion.
mysql> explain select count(*) from (select /*+ inl_join(a,b) */ * from customer a, orders b where a.c_custkey=b.o_custkey)a;
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                            | estRows      | task      | access object                  | operator info                                                                                                                                                                   |
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_10                  | 1.00         | root      |                                | funcs:count(1)->Column#18                                                                                                                                                       |
| └─IndexJoin_41                | 150000000.00 | root      |                                | inner join, inner:IndexReader_40, outer key:tpch100.customer.c_custkey, inner key:tpch100.orders.o_custkey, equal cond:eq(tpch100.customer.c_custkey, tpch100.orders.o_custkey) |
|   ├─IndexReader_33(Build)     | 15000000.00  | root      |                                | index:IndexFullScan_32                                                                                                                                                          |
|   │ └─IndexFullScan_32        | 15000000.00  | cop[tikv] | table:a, index:idx1(C_PHONE)   | keep order:false                                                                                                                                                                |
|   └─IndexReader_40(Probe)     | 150000000.00 | root      |                                | index:IndexRangeScan_39                                                                                                                                                         |
|     └─IndexRangeScan_39       | 150000000.00 | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | range: decided by [eq(tpch100.orders.o_custkey, tpch100.customer.c_custkey)], keep order:false                                                                                  |
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

If Build always executes before Probe, then it can be done by following the execution principle of from right to left, from top to bottom without needing to mark Build and Probe, right? Marking them actually makes it easier to confuse with the internal implementation. Or would it be more reasonable to directly mark (inner, outer)?

| username: 考试没答案 | Original post link

It doesn’t really matter. What you mentioned seems to be the same as the official expression. Some people like to call it the outer table and inner table. Some prefer to call it the driving table and driven table. In TiDB, they are called build and probe. Some even call the inner table the main table. It’s all about distinguishing the primary and secondary roles. Different documents have different terminologies. This is what I’ve encountered; you can use it as a reference.

| username: 人如其名 | Original post link

Because in the internal implementation of index_join, the inner table actually builds the hash_table. However, in the execution plan, the operator for the outer table is marked as build, which can be quite confusing.

| username: 考试没答案 | Original post link

Yes, I also spent a long time understanding this part.

| username: FutureDB | Original post link

This is also something that has always confused me, especially whether the table scan under the operator marked as the Build side will always be processed before the table scan under the operator marked as the Probe side? In actual scenarios, I have found that this is not the case.

| username: Kongdom | Original post link

Are there any real cases? I haven’t encountered this scenario yet.