High CPU Usage with tilflash - Slow SQL - sum() on 90 Columns, 4 Million Records

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tilflash CPU使用率高- 慢SQL-sum() 90个列,400w数据

| username: qiuxb

To improve efficiency, please provide the following information. Clear problem descriptions can be resolved faster:
【TiDB Usage Environment】Production Environment or Test Environment or POC
Production Environment
【TiDB Version】
v4.0.8
【Encountered Problem】
High CPU usage on the machine where TiFlash is located, able to pinpoint specific slow queries

select report_time, sum(1), sum(2), sum(3)…sum(90)
from t1
where report_time >= ‘2022-10-16 00:00:00’ and report_time < ‘2022-10-17 10:00:00’
group by report_time

The t1 table is a large wide table with about 90 fields. It needs to group by the report_time field and aggregate the values of the other 90 fields.

The data volume for one day is about 4 million, and the report_time range is just one day. The data volume is not very large, but there are 90 fields to aggregate.

Some simple tests have been done, reducing the number of sum fields can reduce the SQL execution time, but the business side needs all fields to be aggregated.

This has stalled our optimization efforts.

【Reproduction Path】What operations were performed to encounter the problem
【Problem Phenomenon and Impact】

【Attachments】 Related logs and monitoring (https://metricstool.pingcap.com/)


If the question is about performance optimization or fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results and upload them.

| username: WalterWj | Original post link

If this table is not a partitioned table, this SQL would be better off using TiKV.
Also, this looks like a batch processing SQL, and the semantics seem to analyze the sum by each second? It feels very strange.

| username: qiuxb | Original post link

It is a partitioned table, partitioned by day. I tried using TiKV, and after using TiKV, the CPU usage of all three nodes increased simultaneously, causing the response time of other requests on the business side to become longer and be affected.

| username: Lucien-卢西恩 | Original post link

Confirm a few issues:

  1. Is TiFlash configured according to standard specifications without lacking CPU and memory?
  2. Provide the actual execution plan of the SQL to see if there is any room for optimization?
| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.