Explain Analyze: Analyzing TiDB CPU and Memory Usage Spikes

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: explain analyze分析tidb cpu 内存直接打满

| username: TiDBer_5M9L07sN

CPU 8 cores
Memory 32GB
The dataset is around 200 million rows
Executing a query: select count(*) num, imei from test group by imei order by num desc
CPU and memory usage remain high
I would like to ask whether it is a version (V6.1.0) issue or a usage issue

| username: WalterWj | Original post link

:thinking: What is the resolution of imei? Please send the execution plan.
explain analyze is equivalent to executing SQL.

| username: Kongdom | Original post link

Try removing the order by and see.

| username: 林夕一指 | Original post link

From the principles of distributed computing, TiKV performs aggregation calculations, with each TiKV node sending its own calculation results to the TiDB node. The TiDB node then performs summary calculations and sorting. Therefore, there should be no improvement. This should be entirely determined by the discreteness of the IMEI. The greater the discreteness, the greater the sorting consumption on the TiDB node. The smaller the discreteness, the smaller the consumption on the TiDB node.

| username: Hacker007 | Original post link

You have a large amount of data, right? And you still need to sort it?

| username: tidb狂热爱好者 | Original post link

Adding an index will speed it up.

| username: Tank001 | Original post link

Remove the sorting with “order by null”, and sort it externally. Have you added an index to the IMEI? If not, add it.

| username: TiDBer_5M9L07sN | Original post link

Indexes have been added, and now the service crashes directly.

| username: TiDBer_5M9L07sN | Original post link

The execution plan still uses TiFlash, and the service crashes directly when running.

| username: tidb狂热爱好者 | Original post link

CPU: 8 cores
Memory: 32GB
Dataset: around 200 million rows
Executing a query: select count(*), imei from test group by imei
In this case, create an index on imei.

| username: 人如其名 | Original post link

As long as you don’t mind it being slow, you might as well try:
set tidb_distsql_scan_concurrency=3;
Check if the execution plan takes effect:
explain select /+ stream_agg() / count() num, imei from test group by imei order by num desc;
Execute:
explain analyze select /
+ stream_agg() / count() num, imei from test group by imei order by num desc;

| username: tidb狂热爱好者 | Original post link

I helped him solve this problem. The teacher deployed TiDB, PD, and TiKV on the same server and also added TiFlash. The system resources were insufficient. I have already asked him to redeploy each component independently.

| username: Kongdom | Original post link

:+1: :+1: :+1: