Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TiFlash 如何优化才能更快点?
An online business SQL query takes 55 seconds on TiKV, but only 6 seconds when forced to use TiFlash. However, the development team is concerned that with only 10 days of test data, the query might become very slow once a year’s worth of data is added. How can we optimize TiFlash to improve query speed?
- TiFlash has 2 nodes, with 2 replicas.
- The SQL query is as follows:
SELECT
*
FROM
(SELECT
channel.reg_date,
iap.iap_1w_amount,
iap.iap_1_amount,
cost.cost,
channel.retention_cnt,
channel.iap_amount,
iap.iap_2w_user,
retention.retention_2w_cnt,
iap.iap_3_user,
iap.iap_1w_user,
channel.reg_cnt,
channel.iap_cnt,
retention.retention_3_cnt,
channel.install_cnt,
retention.retention_1_cnt,
iap.iap_1_user,
retention.retention_1w_cnt,
cost.show_cnt,
channel.char_cnt,
cost.click_cnt,
retention.retention_2_cnt,
iap.iap_2_user,
iap.iap_3_amount,
iap.iap_2w_amount,
iap.iap_2_amount
FROM
(SELECT
acs.reg_date,
SUM(retention_cnt) AS retention_cnt,
SUM(iap_amount) AS iap_amount,
SUM(reg_cnt) AS reg_cnt,
SUM(iap_cnt) AS iap_cnt,
SUM(install_cnt) AS install_cnt,
SUM(char_cnt) AS char_cnt
FROM
ad_channel_stat_1051 acs
LEFT JOIN ad_campaign_config c
ON acs.campaign_id = c.campaign_id
WHERE acs.user_type = 1
AND acs.reg_date >= '2022-11-01'
AND acs.reg_date <= '2022-11-30'
AND acs.gw_id = 105102
GROUP BY acs.reg_date) channel
LEFT JOIN
(SELECT
ars.reg_date,
SUM(retention_2w_cnt) AS retention_2w_cnt,
SUM(retention_3_cnt) AS retention_3_cnt,
SUM(retention_1_cnt) AS retention_1_cnt,
SUM(retention_1w_cnt) AS retention_1w_cnt,
SUM(retention_2_cnt) AS retention_2_cnt
FROM
ad_retention_stat_1051 ars
LEFT JOIN ad_campaign_config c
ON ars.campaign_id = c.campaign_id
WHERE ars.user_type = 1
AND ars.reg_date >= '2022-11-01'
AND ars.reg_date <= '2022-11-30'
AND ars.gw_id = 105102
GROUP BY ars.reg_date) retention
ON channel.reg_date = retention.reg_date
LEFT JOIN
(SELECT
ais.reg_date,
SUM(iap_1w_amount) AS iap_1w_amount,
SUM(iap_1_amount) AS iap_1_amount,
SUM(iap_2w_user) AS iap_2w_user,
SUM(iap_3_user) AS iap_3_user,
SUM(iap_1w_user) AS iap_1w_user,
SUM(iap_1_user) AS iap_1_user,
SUM(iap_2_user) AS iap_2_user,
SUM(iap_3_amount) AS iap_3_amount,
SUM(iap_2w_amount) AS iap_2w_amount,
SUM(iap_2_amount) AS iap_2_amount
FROM
ad_iap_stat_1051 ais
LEFT JOIN ad_campaign_config c
ON ais.campaign_id = c.campaign_id
WHERE ais.user_type = 1
AND ais.reg_date >= '2022-11-01'
AND ais.reg_date <= '2022-11-30'
AND ais.gw_id = 105102
GROUP BY ais.reg_date) iap
ON channel.reg_date = iap.reg_date
LEFT JOIN
(SELECT
acs.report_date,
SUM(cost) AS cost,
SUM(show_cnt) AS show_cnt,
SUM(click_cnt) AS click_cnt
FROM
ad_cost_stat_1051 acs
LEFT JOIN ad_campaign_config c
ON acs.campaign_id = c.campaign_id
WHERE acs.report_date >= '2022-11-01'
AND acs.report_date <= '2022-11-30'
AND acs.gw_id = 105102
GROUP BY acs.report_date) cost
ON channel.reg_date = cost.report_date) tmp
LIMIT 1000 OFFSET 0;
- The execution plan is as follows:
...
- Does adding more TiFlash nodes improve query speed?
Adding more TiFlash nodes can potentially improve query speed by distributing the workload across more nodes, thus increasing parallelism and reducing the time taken for each node to process its share of the data. However, the actual performance gain depends on various factors such as network latency, data distribution, and the specific query patterns. It is also important to ensure that the TiFlash nodes are properly configured and that the data is evenly distributed across the nodes to fully leverage the additional resources.