When joining tables in TiFlash, the Batch_Point_Get operator is used, preventing the use of TiFlash's MPP algorithm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 在tiflash中join表的时候走了Batch_Point_Get算子从而导致没法用上tiflash的mpp算法

| username: 青木科技-熊猛

Main Table:

CREATE TABLE `result_daily_sku_id` (
   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment ID',
   `day` date NOT NULL DEFAULT '1970-01-01' COMMENT 'Date',
   `brand_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Brand ID',
   `shop_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Shop ID',
   `sku_id` varchar(100) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT 'Platform Barcode ID',
   `sku_sn` varchar(100) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT 'Merchant Code',
   `is_live` tinyint(2) unsigned NOT NULL DEFAULT '0' COMMENT 'Is Live Product 0: No 1: Yes',
   PRIMARY KEY (`shop_id`,`day`,`sku_id`,`sku_sn`,`is_live`) /*T![clustered_index] CLUSTERED */,
   UNIQUE KEY `idx_uniq_id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci AUTO_INCREMENT=1 COMMENT='Main Table';

Secondary Table:

CREATE TABLE `shop` (
    `id` int(10) NOT NULL AUTO_INCREMENT COMMENT 'Shop ID',
    `shop_category` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT 'Shop Category',
    PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci AUTO_INCREMENT=1 COMMENT='Secondary Table';

Executing the following SQL:

EXPLAIN ANALYZE
SELECT /*+ read_from_storage(tiflash[result_daily_sku_id,shop]) */
    `shop`.`shop_category`, COUNT(1) num
FROM `result_daily_sku_id`
         LEFT JOIN `shop` ON `result_daily_sku_id`.`shop_id` = `shop`.`id`
WHERE `result_daily_sku_id`.`day` BETWEEN '2023-01-01' AND '2023-07-31' AND shop.id IN (12, 17)
GROUP BY `shop`.`shop_category`;

Because the secondary table shop data is accessed via the Batch_Point_Get operator, TiFlash only reads the data and returns it to the root node, which happens to use MergeJoin. This algorithm is not very optimal. When the main table has a large amount of data, the SQL will be very slow.

How can we prevent the shop table from using Batch_Point_Get and instead use TableFullScan in TiFlash, so that the entire aggregation calculation uses the MPP mode of TiFlash?

| username: 有猫万事足 | Original post link

If you want TiDB to ignore the cost estimation of the optimizer and force the use of MPP, you can set it with the following statements:

set @@session.tidb_allow_mpp=1;
set @@session.tidb_enforce_mpp=1;

You can try this to see if there is any improvement.

| username: 青木科技-熊猛 | Original post link

I tried it and it didn’t work. Is it possible that the system thinks the Batch_Point_Get operator is too excellent and just ignores it?

| username: 有猫万事足 | Original post link

Your SQL should be equivalent to the one below, try this one:

select shop.shop_category, sum(t.num) num from 
(
select rd.shop_id, count(1) num
from result_daily_sku_id rd 
where rd.day BETWEEN '2023-01-01' AND '2023-07-31' and rd.shop_id in (12, 17)
group by rd.shop_id
) t
left join shop on shop.id = t.shop_id
group by shop.shop_category

The idea is to first try to aggregate the data from the large table, and then join it with the smaller table.

| username: 青木科技-熊猛 | Original post link

Yes. However, we are doing low-code BI development, and it is quite difficult to assemble SQL like this.

| username: h5n1 | Original post link

Try executing the SQL manually with set session tidb_isolation_read_engines=["tiflash","tidb"] to see if it can use TiFlash.

| username: 青木科技-熊猛 | Original post link

I tried it, and it indeed works by setting the variable.

| username: h5n1 | Original post link

Which version are you using? The optimizer probably needs improvement.

| username: 青木科技-熊猛 | Original post link

The version is 6.5

| username: redgame | Original post link

Should this scenario try to avoid using the Batch_Point_Get operator as much as possible?

| username: 有猫万事足 | Original post link

It should be said that the Batch_Point_Get operator is not the core of the problem.

The core of the problem is that the association/aggregation should be done as much as possible within TiFlash, so that the result is given to TiDB.
If row records are fetched from TiFlash and merge join and hash aggregation are done in TiDB, it is a lose-lose result.
It means that neither side is playing to its strengths.

| username: TiDBer_iCdTOZ1r | Original post link

Check which version it is.

| username: ShawnYan | Original post link

The effect is the same when tested on 7.2,

5.7.25-TiDB-v7.2.0 (root@shawnyan) [test] 09:31:02> set session tidb_isolation_read_engines=tiflash;
Query OK, 0 rows affected (0.001 sec)

5.7.25-TiDB-v7.2.0 (root@shawnyan) [test] 09:31:13> tidb_isolation_read_engines=\c
5.7.25-TiDB-v7.2.0 (root@shawnyan) [test] 09:31:17> EXPLAIN ANALYZE
    -> SELECT /*+ read_from_storage(tiflash[result_daily_sku_id,shop]) */
    ->     `shop`.`shop_category`, COUNT(1) num
    -> FROM `result_daily_sku_id`
    ->          LEFT JOIN `shop` ON `result_daily_sku_id`.`shop_id` = `shop`.`id`
    -> WHERE `result_daily_sku_id`.`day` BETWEEN '2023-01-01' AND '2023-07-31' AND shop.id IN (12, 17)
    -> GROUP BY `shop`.`shop_category`;
+------------------------------------------------+---------+---------+--------------+---------------------------+----------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id                                             | estRows | actRows | task         | access object             | execution info                                                                                           | operator info                                                                                                                                  | memory    | disk |
+------------------------------------------------+---------+---------+--------------+---------------------------+----------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| TableReader_58                                 | 1.00    | 0       | root         |                           | time:25.7ms, loops:1, RU:0.000000, cop_task: {num: 1, max: 0s, proc_keys: 0, copr_cache_hit_ratio: 0.00} | MppVersion: 1, data:ExchangeSender_57                                                                                                          | 707 Bytes | N/A  |
| └─ExchangeSender_57                        | 1.00    | 0       | mpp[tiflash] |                           | tiflash_task:{time:21ms, loops:0, threads:4}                                                             | ExchangeType: PassThrough                                                                                                                      | N/A       | N/A  |
|   └─Projection_9                           | 1.00    | 0       | mpp[tiflash] |                           | tiflash_task:{time:20ms, loops:0, threads:4}                                                             | test.shop.shop_category, Column#10                                                                                                             | N/A       | N/A  |
|     └─Projection_52                        | 1.00    | 0       | mpp[tiflash] |                           | tiflash_task:{time:20ms, loops:0, threads:4}                                                             | Column#10, test.shop.shop_category                                                                                                             | N/A       | N/A  |
|       └─HashAgg_50                         | 1.00    | 0       | mpp[tiflash] |                           | tiflash_task:{time:20ms, loops:0, threads:4}                                                             | group by:test.shop.shop_category, funcs:count(1)->Column#10, funcs:firstrow(test.shop.shop_category)->test.shop.shop_category, stream_count: 4 | N/A       | N/A  |
|         └─ExchangeReceiver_36              | 0.62    | 0       | mpp[tiflash] |                           | tiflash_task:{time:20ms, loops:0, threads:4}                                                             | stream_count: 4                                                                                                                                | N/A       | N/A  |
|           └─ExchangeSender_35              | 0.62    | 0       | mpp[tiflash] |                           | tiflash_task:{time:17.3ms, loops:0, threads:8}                                                           | ExchangeType: HashPartition, Compression: FAST, Hash Cols: [name: test.shop.shop_category, collate: utf8mb4_general_ci], stream_count: 4       | N/A       | N/A  |
|             └─HashJoin_34                  | 0.62    | 0       | mpp[tiflash] |                           | tiflash_task:{time:16.3ms, loops:0, threads:8}                                                           | inner join, equal:[eq(test.result_daily_sku_id.shop_id, test.shop.id)]                                                                         | N/A       | N/A  |
|               ├─ExchangeReceiver_22(Build) | 5.00    | 0       | mpp[tiflash] |                           | tiflash_task:{time:15.3ms, loops:0, threads:8}                                                           |                                                                                                                                                | N/A       | N/A  |
|               │ └─ExchangeSender_21      | 5.00    | 0       | mpp[tiflash] |                           | tiflash_task:{time:9.11ms, loops:0, threads:1}                                                           | ExchangeType: Broadcast, Compression: FAST                                                                                                     | N/A       | N/A  |
|               │   └─TableRangeScan_20    | 5.00    | 0       | mpp[tiflash] | table:result_daily_sku_id | tiflash_task:{time:9.11ms, loops:0, threads:1}                                                           | range:[12 2023-01-01,12 2023-07-31], [17 2023-01-01,17 2023-07-31], keep order:false, stats:pseudo                                             | N/A       | N/A  |
|               └─TableRangeScan_23(Probe)   | 2.00    | 0       | mpp[tiflash] | table:shop                | tiflash_task:{time:14.3ms, loops:0, threads:1}                                                           | range:[12,12], [17,17], keep order:false, stats:pseudo                                                                                         | N/A       | N/A  |
+------------------------------------------------+---------+---------+--------------+---------------------------+----------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
12 rows in set (0.124 sec)
| username: ShawnYan | Original post link

Single point query is the least costly.


A small suggestion for the original poster: you can put the SQL statements in markdown, which makes it easier for everyone to copy. Also, the end of the shop table is still in Chinese 【;】

| username: 青木科技-熊猛 | Original post link

Got it. The cost of a single full scan is minimal, but the cost of subsequent joins and aggregation calculations will be high.

| username: 青木科技-熊猛 | Original post link

The version is 6.5

| username: jansu-dev | Original post link

Are you able to accept the solutions provided by the teachers?

It seems the issues can be summarized as follows:

  1. Why is the hint not effective? This situation violates the definition of hint behavior.
    → Reason: Found a similar situation Hint to read from tiflash doesn’t work when the query is a PointGet · Issue #16909 · pingcap/tidb (github.com). From the unmerged PR, it appears that point range ignores other AccessPaths, leading to the direct disregard of accessing data from TiFlash.
    → The teachers have also provided some solutions.

  2. Will fully using MPP really be faster? If the current tests show an improvement, we can use the teachers’ methods as a workaround :joy:.
    Actually, if the main table were larger, it might trigger some other limitations of TiFlash, but the current data volume is still manageable :thinking:.

If there are no issues, we can select one of the teachers’ answers as a workaround and conclude the discussion.

| username: 青木科技-熊猛 | Original post link

Whether using MPP entirely will actually be faster, this data volume is still fast.

Secondly, if the main table is larger, what limitations will be triggered?

“If there are no issues, then select one of the above teachers’ answers as a workaround and conclude.” How exactly do I do this? I’ve selected the best answer.

| username: jansu-dev | Original post link

For example: Packet is too large to send → The block returned by InputStream should obey the max_block_size constraint · Issue #3436 · pingcap/tiflash (github.com)
If the data of a single MPPDataPacket exceeds 2G, this error will occur.

| username: jansu-dev | Original post link

Selecting the best answer marks the end, maybe when I looked at it, it hadn’t been selected yet.