The "select count(1) from table is very slow"

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: select count(1) from 表很慢

| username: TiDBer_FMWXa7ja

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】6.5
【Reproduction Path】
【Encountered Problem: Problem Phenomenon and Impact】TiFlash is not installed, and performing select count(1) from on a table with 1 billion rows is very slow. Is this normal? What are the possible solutions?
【Resource Configuration】
【Attachments: Screenshots/Logs/Monitoring】

| username: buddyyuan | Original post link

Check the resource usage of TiKV.

| username: Kongdom | Original post link

Does the table have a primary key? Are there conditions like “order by” in the complete statement?

| username: h5n1 | Original post link

If there is no index, count(1) will scan the entire table, while count(*) will use the primary key index, etc.

| username: Kongdom | Original post link

Is there a difference between these two? It feels like in actual use, count(1) and count() have similar effects, but internally, our company has always believed that count(1) is more efficient than count().

| username: TiDBer_FMWXa7ja | Original post link

The table has a composite primary key and no conditions. select count(*) from d_secondminutedata;

| username: Kongdom | Original post link

Execute this statement to see the execution plan:
explain analyze select count(*) from d_secondminutedata;

| username: h5n1 | Original post link

I tried it and it’s the same.

| username: Kongdom | Original post link

:+1: Practical expert

| username: TiDBer_FMWXa7ja | Original post link

The image is not visible. Please provide the text you need translated.

| username: h5n1 | Original post link

What is your resource configuration and disk type? 47 seconds for 1 billion rows is not bad.

| username: TiDBer_FMWXa7ja | Original post link

6 KV nodes, each machine has 16 cores and 16GB of memory. 47 seconds is the result of the second execution, the first execution took almost 200 seconds.

| username: 我是咖啡哥 | Original post link

I have encountered this before. If the DML on this table is very frequent, it will be slow. After stopping the application, the result comes out in seconds.
For around 300,000 rows of data, there’s a difference between 12 seconds and 0.2 seconds.

| username: h5n1 | Original post link

What type of disk is it? Please upload the results of the explain analyze. Some parts of the screenshot are not visible.

| username: buddyyuan | Original post link

Upload the relevant information of this statement in the slow log.

| username: liuis | Original post link

Let’s take a look at the explain analyze.

| username: WalterWj | Original post link

With 1 billion records, if TiKV is not very busy and there are enough CPUs, it should be able to complete the count in just a few seconds. :thinking:

| username: ffeenn | Original post link

What is the configuration size for TiDB?

| username: 人如其名 | Original post link

It seems that TiDB equates count(1) and count(*) at the parser level.

| username: 人如其名 | Original post link

Look at this, it’s so slow…

mysql> explain analyze select /*+ read_from_storage(tikv[lineitem]) */ count(*) from lineitem;
+----------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+-----------+------+
| id                         | estRows     | actRows  | task      | access object  | execution info                                                                                                                                                                                                                                                                                                                                                                                   | operator info                     | memory    | disk |
+----------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+-----------+------+
| HashAgg_13                 | 1.00        | 1        | root      |                | time:39.4s, loops:2, partial_worker:{wall_time:39.362488482s, concurrency:5, task_num:1, tot_wait:3m16.81108575s, tot_exec:94.27µs, tot_time:3m16.811192801s, max:39.362249552s, p95:39.362249552s}, final_worker:{wall_time:0s, concurrency:5, task_num:1, tot_wait:3m16.811393418s, tot_exec:30.083µs, tot_time:3m16.811426486s, max:39.362296841s, p95:39.362296841s}                         | funcs:count(Column#18)->Column#17 | 16.5 KB   | N/A  |
| └─TableReader_14           | 1.00        | 156      | root      |                | time:39.4s, loops:2, cop_task: {num: 159, max: 7.69s, min: 212.8ms, avg: 3.54s, p95: 7.15s, max_proc_keys: 462798, p95_proc_keys: 462756, tot_proc: 8m47.6s, tot_wait: 33.5s, rpc_num: 159, rpc_time: 9m23.4s, copr_cache_hit_ratio: 0.00, build_task_duration: 274.2µs, max_distsql_concurrency: 15}, backoff{regionMiss: 2ms}                                                                  | data:HashAgg_6                    | 438 Bytes | N/A  |
|   └─HashAgg_6              | 1.00        | 156      | cop[tikv] |                | tikv_task:{proc max:7.2s, min:212ms, avg: 3.32s, p80:5.75s, p95:6.83s, iters:70417, tasks:159}, scan_detail: {total_process_keys: 72080384, total_process_keys_size: 2594893824, total_keys: 73692884, get_snapshot_time: 16.4s, rocksdb: {delete_skipped_count: 130912, key_skipped_count: 73762895, block: {cache_hit_count: 2208, read_count: 455143, read_byte: 4.43 GB, read_time: 5.11s}}} | funcs:count(1)->Column#18         | N/A       | N/A  |
|     └─TableFullScan_12     | 70899712.00 | 72080384 | cop[tikv] | table:lineitem | tikv_task:{proc max:7.2s, min:211ms, avg: 3.32s, p80:5.75s, p95:6.83s, iters:70417, tasks:159}                                                                                                                                                                                                                                                                                                   | keep order:false                  | N/A       | N/A  |
+----------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+-----------+------+
4 rows in set (39.36 sec)

Main slow points:

  1. copr_cache_hit_ratio=0, no cache hit on the TiDB side.
  2. get_snapshot_time: 16.4s, not sure why it’s so slow…
  3. read_byte: 4.43 GB, read_time: 5.11s, physical read occurred.
  4. Table scan is slow, my understanding is: time:39.4s - get_snapshot_time: 16.4s - read_time: 5.11s = 17.89s
    17.89s * 15 / tasks:159 = 1.69s, pure memory read of data from one region takes an average of 1.69 seconds, feels very slow…
    At that time, data was being loaded concurrently, which should be due to insufficient CPU. But why is get_snapshot_time so slow?

Sometimes 15 tasks working together can barely keep up with one task in a traditional database. It feels like the data read and scan efficiency of TiKV is not high.