Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 在tiflash中join表的时候走了Batch_Point_Get算子从而导致没法用上tiflash的mpp算法

Main Table:
CREATE TABLE `result_daily_sku_id` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment ID',
`day` date NOT NULL DEFAULT '1970-01-01' COMMENT 'Date',
`brand_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Brand ID',
`shop_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Shop ID',
`sku_id` varchar(100) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT 'Platform Barcode ID',
`sku_sn` varchar(100) COLLATE utf8mb4_general_ci NOT NULL DEFAULT '' COMMENT 'Merchant Code',
`is_live` tinyint(2) unsigned NOT NULL DEFAULT '0' COMMENT 'Is Live Product 0: No 1: Yes',
PRIMARY KEY (`shop_id`,`day`,`sku_id`,`sku_sn`,`is_live`) /*T![clustered_index] CLUSTERED */,
UNIQUE KEY `idx_uniq_id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci AUTO_INCREMENT=1 COMMENT='Main Table';
Secondary Table:
CREATE TABLE `shop` (
`id` int(10) NOT NULL AUTO_INCREMENT COMMENT 'Shop ID',
`shop_category` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL COMMENT 'Shop Category',
PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci AUTO_INCREMENT=1 COMMENT='Secondary Table';
Executing the following SQL:
EXPLAIN ANALYZE
SELECT /*+ read_from_storage(tiflash[result_daily_sku_id,shop]) */
`shop`.`shop_category`, COUNT(1) num
FROM `result_daily_sku_id`
LEFT JOIN `shop` ON `result_daily_sku_id`.`shop_id` = `shop`.`id`
WHERE `result_daily_sku_id`.`day` BETWEEN '2023-01-01' AND '2023-07-31' AND shop.id IN (12, 17)
GROUP BY `shop`.`shop_category`;
Because the secondary table shop
data is accessed via the Batch_Point_Get operator, TiFlash only reads the data and returns it to the root node, which happens to use MergeJoin. This algorithm is not very optimal. When the main table has a large amount of data, the SQL will be very slow.
How can we prevent the shop
table from using Batch_Point_Get and instead use TableFullScan in TiFlash, so that the entire aggregation calculation uses the MPP mode of TiFlash?