How is pushdown implemented in TiDB when associating multiple tables?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 请教TiDB多表关联时是怎么实现下推的?

| username: 江湖故人

In distributed databases like Greenplum, when the join key and the distribution key are not the same, there are concepts of redistribution and broadcasting. Can TiDB push down joins to TiKV nodes when joining multiple tables, and how is this specifically implemented?

| username: Jellybean | Original post link

The overall idea is that during the execution plan optimization phase, multiple tables are first converted into multiple groups of two-table join queries. The two-table join queries are then transformed into single-table queries. At the storage layer, single tables are filtered and data is read. The data is then returned to the computation layer for joint queries. Different join operators for joint queries will follow their respective computation and data reading processes.

| username: heiwandou | Original post link

After table sharding, the data in each shard is computed on their respective nodes, reducing the computational pressure on the TiDB nodes. The TiDB nodes are responsible for aggregating the computations.

| username: tidb菜鸟一只 | Original post link

My understanding is that TiDB does not push down joins to TiKV nodes; it only pushes down predicates in order to minimize the amount of data returned by each TiKV node and reduce transmission volume. However, it does not directly push down the join of two tables to the TiKV nodes and only return the joined data.

| username: Lloyd-Pottiger | Original post link

Join operations cannot be pushed down to TiKV, but TiFlash supports join pushdown. You can add a TiFlash node and add TiFlash replicas for the relevant tables.

| username: 江湖故人 | Original post link

Boss, do you have any related materials?

| username: 春风十里 | Original post link

Please refer to the official documentation regarding the pushdown of TopN and Limit:
TopN and Limit Pushdown | PingCAP Documentation Center

Example 2: TopN Pushdown over Join (Sorting rule depends only on columns from the outer table)

create table t(id int primary key, a int not null);
create table s(id int primary key, a int not null);
explain select * from t left join s on t.a = s.a order by t.a limit 10;
+----------------------------------+----------+-----------+---------------+-------------------------------------------------+
| id                               | estRows  | task      | access object | operator info                                   |
+----------------------------------+----------+-----------+---------------+-------------------------------------------------+
| TopN_12                          | 10.00    | root      |               | test.t.a, offset:0, count:10                    |
| └─HashJoin_17                    | 12.50    | root      |               | left outer join, equal:[eq(test.t.a, test.s.a)] |
|   ├─TopN_18(Build)               | 10.00    | root      |               | test.t.a, offset:0, count:10                    |
|   │ └─TableReader_26             | 10.00    | root      |               | data:TopN_25                                    |
|   │   └─TopN_25                  | 10.00    | cop[tikv] |               | test.t.a, offset:0, count:10                    |
|   │     └─TableFullScan_24       | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo                  |
|   └─TableReader_30(Probe)        | 10000.00 | root      |               | data:TableFullScan_29                           |
|     └─TableFullScan_29           | 10000.00 | cop[tikv] | table:s       | keep order:false, stats:pseudo                  |
+----------------------------------+----------+-----------+---------------+-------------------------------------------------+
8 rows in set (0.01 sec)

In this query, the sorting rule of the TopN operator depends only on columns from the outer table t, so TopN can be pushed down to be computed before the Join to reduce the computational overhead during the Join. Additionally, TiDB also pushes down TopN to the storage layer.

Example 4: TopN Converted to Limit

create table t(id int primary key, a int not null);
create table s(id int primary key, a int not null);
explain select * from t left join s on t.a = s.a order by t.id limit 10;
+----------------------------------+----------+-----------+---------------+-------------------------------------------------+
| id                               | estRows  | task      | access object | operator info                                   |
+----------------------------------+----------+-----------+---------------+-------------------------------------------------+
| TopN_12                          | 10.00    | root      |               | test.t.id, offset:0, count:10                   |
| └─HashJoin_17                    | 12.50    | root      |               | left outer join, equal:[eq(test.t.a, test.s.a)] |
|   ├─Limit_21(Build)              | 10.00    | root      |               | offset:0, count:10                              |
|   │ └─TableReader_31             | 10.00    | root      |               | data:Limit_30                                   |
|   │   └─Limit_30                 | 10.00    | cop[tikv] |               | offset:0, count:10                              |
|   │     └─TableFullScan_29       | 10.00    | cop[tikv] | table:t       | keep order:true, stats:pseudo                   |
|   └─TableReader_35(Probe)        | 10000.00 | root      |               | data:TableFullScan_34                           |
|     └─TableFullScan_34           | 10000.00 | cop[tikv] | table:s       | keep order:false, stats:pseudo                  |
+----------------------------------+----------+-----------+---------------+-------------------------------------------------+
8 rows in set (0.00 sec)

In the above query, TopN is first pushed down to the outer table t. Since it needs to sort by t.id, which is the primary key of table t, it can be read in order directly (keep order:true), thus omitting the sorting in TopN and simplifying it to Limit.

| username: 小龙虾爱大龙虾 | Original post link

When using the TiKV storage engine with TiDB, JOIN operations cannot be pushed down to TiKV for execution as you mentioned. At most, various conditions can be pushed down to increase execution efficiency, but the JOIN operator execution is still performed at the TiDB Server computation layer. You can refer to the official documentation’s optimization section to understand operator execution: TiDB 执行计划概览 | PingCAP 文档中心

When using the TiFlash storage engine, you can execute in MPP mode, allowing TiFlash to perform JOIN operations. This enables the use of operators such as Broadcast Hash Join, Shuffled Hash Join, and Shuffled Hash Aggregation, which involves data redistribution, broadcasting, and other operations as you mentioned. Refer to: 使用 MPP 模式 | PingCAP 文档中心

| username: dba远航 | Original post link

The prerequisite for pushdown is to use the shard key for join queries.

| username: 江湖故人 | Original post link

As I expected, TiDB’s region sharding is not based on hash, and there is no concept of a replicated table, so it cannot push down associations. Thank you all for clarifying :pray:

| username: 江湖故人 | Original post link

Just looked at Runtime Filter, the design is very clever :+1:

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.