TiFlash Distinct is Very Slow, Seeking Optimization

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash distinct 非常慢,求优化

| username: LKLK

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

In the online v5.4.1 environment, queries using TiFlash with distinct are very slow. Even after setting set @@tidb_opt_distinct_agg_push_down = 1;, the performance is still very slow. However, removing distinct makes it very fast. How can this be optimized?

SELECT * FROM (select login_date, sum(login_cnt) as login_cnt, count(distinct id) as active_user from log_login_1051 where login_date >= '2022-11-29' AND login_date <= '2022-12-28' group by login_date) tmp ORDER BY login_date ASC LIMIT 1000 OFFSET 0;
MySQL [xxx]> select count(1) from log_login_1051;
+-----------+
| count(1)  |
+-----------+
| 242504116 |
+-----------+
1 row in set (0.67 sec)
| username: h5n1 | Original post link

Post the execution plan result of explain analyze.

| username: LKLK | Original post link

You can only execute explain, not explain analyze, otherwise it will cause OOM.

| username: Bing | Original post link

Post the execution plan result of explain analyze.

| username: forever | Original post link

How about collecting the statistics?

| username: LKLK | Original post link

In the above explain3.

| username: Minorli-PingCAP | Original post link

Try refreshing the statistics and then give it another shot.