Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: explain analyze分析tidb cpu 内存直接打满
CPU 8 cores
Memory 32GB
The dataset is around 200 million rows
Executing a query: select count(*) num, imei from test group by imei order by num desc
CPU and memory usage remain high
I would like to ask whether it is a version (V6.1.0) issue or a usage issue
What is the resolution of imei? Please send the execution plan.
explain analyze is equivalent to executing SQL.
Try removing the order by and see.
From the principles of distributed computing, TiKV performs aggregation calculations, with each TiKV node sending its own calculation results to the TiDB node. The TiDB node then performs summary calculations and sorting. Therefore, there should be no improvement. This should be entirely determined by the discreteness of the IMEI. The greater the discreteness, the greater the sorting consumption on the TiDB node. The smaller the discreteness, the smaller the consumption on the TiDB node.
You have a large amount of data, right? And you still need to sort it?
Adding an index will speed it up.
Remove the sorting with “order by null”, and sort it externally. Have you added an index to the IMEI? If not, add it.
Indexes have been added, and now the service crashes directly.
The execution plan still uses TiFlash, and the service crashes directly when running.
CPU: 8 cores
Memory: 32GB
Dataset: around 200 million rows
Executing a query: select count(*), imei from test group by imei
In this case, create an index on imei
.
As long as you don’t mind it being slow, you might as well try:
set tidb_distsql_scan_concurrency=3;
Check if the execution plan takes effect:
explain select /+ stream_agg() / count() num, imei from test group by imei order by num desc;
Execute:
explain analyze select /+ stream_agg() / count() num, imei from test group by imei order by num desc;
I helped him solve this problem. The teacher deployed TiDB, PD, and TiKV on the same server and also added TiFlash. The system resources were insufficient. I have already asked him to redeploy each component independently.