There is a significant difference between disk read statistics and actual execution time when TiKV performs disk reads

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv发生磁盘读时磁盘读统计信息和实际执行时间差异大

| username: 人如其名

[TiDB Usage Environment] Poc
[TiDB Version] v7.0.0

For a tpch100 statement: select /*+ read_from_storage(tikv[a]) */ sql_no_cache count(*) from customer a where C_ACCTBAL < -10000;

When RocksDB performs a disk read, it takes 10 seconds, and when there is no disk read, it takes 5 seconds. However, the execution plan shows that the disk read time is only a few hundred milliseconds. The times don’t match up, so where is the time being spent?

-- Disable TiDB cache
mysql> show config where name like 'tikv-client.copr-cache.capacity-mb';
+------+---------------------+------------------------------------+-------+
| Type | Instance            | Name                               | Value |
+------+---------------------+------------------------------------+-------+
| tidb | 192.168.31.201:4000 | tikv-client.copr-cache.capacity-mb | 0     |
+------+---------------------+------------------------------------+-------+
1 row in set (0.01 sec)

-- Set distsql concurrency to 1 to avoid task backlog in TiKV's coprocessor.
mysql> set tidb_distsql_scan_concurrency=1;
Query OK, 0 rows affected (0.01 sec)

-- readpool.coprocessor.low-concurrency=2, but since tidb_distsql_scan_concurrency=1, there will be no backlog.
mysql> show config where name like 'readpool.coprocessor%' and type='tikv';
+------+----------------------+--------------------------------------------------+-------+
| Type | Instance             | Name                                             | Value |
+------+----------------------+--------------------------------------------------+-------+
| tikv | 192.168.31.201:20160 | readpool.coprocessor.high-concurrency            | 9     |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.low-concurrency             | 2     |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.max-tasks-per-worker-high   | 2000  |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.max-tasks-per-worker-low    | 2000  |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.max-tasks-per-worker-normal | 2000  |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.normal-concurrency          | 2     |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.stack-size                  | 10MiB |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.use-unified-pool            | false |
+------+----------------------+--------------------------------------------------+-------+
8 rows in set (0.01 sec)

-- First run with sql_no_cache to avoid loading data into block-cache, resulting in more physical reads: read_byte: 1.13 GB, read_time: 282.8ms
mysql> explain analyze select /*+ read_from_storage(tikv[a]) */ sql_no_cache count(*) from customer a where C_ACCTBAL < -10000;
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| id                           | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                        | operator info                          | memory    | disk |
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| StreamAgg_18                 | 1.00        | 1        | root      |               | time:10.2s, loops:2, RRU:49975.958141, WRU:0.000000                                                                                                                                                                                                                                                                                                                   | funcs:count(Column#11)->Column#9       | 388 Bytes | N/A  |
| └─TableReader_19             | 1.00        | 0        | root      |               | time:10.2s, loops:1, cop_task: {num: 33, max: 453.4ms, min: 255.6ms, avg: 309.6ms, p95: 356.1ms, max_proc_keys: 650432, p95_proc_keys: 448553, tot_proc: 10.2s, tot_wait: 2.61ms, rpc_num: 33, rpc_time: 10.2s, copr_cache: disabled, build_task_duration: 46.6µs, max_distsql_concurrency: 1}                                                                        | data:StreamAgg_9                       | 380 Bytes | N/A  |
|   └─StreamAgg_9              | 1.00        | 0        | cop[tikv] |               | tikv_task:{proc max:453ms, min:255ms, avg: 308.5ms, p80:316ms, p95:355ms, iters:14656, tasks:33}, scan_detail: {total_process_keys: 15000000, total_process_keys_size: 3052270577, total_keys: 15000033, get_snapshot_time: 1.58ms, rocksdb: {key_skipped_count: 15000000, block: {cache_hit_count: 170, read_count: 99848, read_byte: 1.13 GB, read_time: 282.8ms}}} | funcs:count(1)->Column#11              | N/A       | N/A  |
|     └─Selection_12           | 0.00        | 0        | cop[tikv] |               | tikv_task:{proc max:453ms, min:254ms, avg: 308ms, p80:316ms, p95:355ms, iters:14656, tasks:33}                                                                                                                                                                                                                                                                        | lt(tpch100.customer.c_acctbal, -10000) | N/A       | N/A  |
|       └─TableFullScan_11     | 15000000.00 | 15000000 | cop[tikv] | table:a       | tikv_task:{proc max:411ms, min:230ms, avg: 282.4ms, p80:291ms, p95:327ms, iters:14656, tasks:33}                                                                                                                                                                                                                                                                      | keep order:false                       | N/A       | N/A  |
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
5 rows in set (10.22 sec)

-- Second run without sql_no_cache, but since the previous run used sql_no_cache and didn't put data into block-cache (to avoid cache pollution), this run also resulted in more physical reads: read_byte: 1.13 GB, read_time: 282.8ms.
mysql> explain analyze select /*+ read_from_storage(tikv[a]) */ count(*) from customer a where C_ACCTBAL < -10000;
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| id                           | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                        | operator info                          | memory    | disk |
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| StreamAgg_18                 | 1.00        | 1        | root      |               | time:11.2s, loops:2, RRU:50306.220368, WRU:0.000000                                                                                                                                                                                                                                                                                                                   | funcs:count(Column#11)->Column#9       | 388 Bytes | N/A  |
| └─TableReader_19             | 1.00        | 0        | root      |               | time:11.2s, loops:1, cop_task: {num: 33, max: 480ms, min: 286.3ms, avg: 339.9ms, p95: 391.7ms, max_proc_keys: 650432, p95_proc_keys: 448553, tot_proc: 11.2s, tot_wait: 5.53ms, rpc_num: 33, rpc_time: 11.2s, copr_cache: disabled, build_task_duration: 55.8µs, max_distsql_concurrency: 1}                                                                          | data:StreamAgg_9                       | 381 Bytes | N/A  |
|   └─StreamAgg_9              | 1.00        | 0        | cop[tikv] |               | tikv_task:{proc max:479ms, min:284ms, avg: 338.8ms, p80:351ms, p95:389ms, iters:14656, tasks:33}, scan_detail: {total_process_keys: 15000000, total_process_keys_size: 3052270577, total_keys: 15000033, get_snapshot_time: 4.08ms, rocksdb: {key_skipped_count: 15000000, block: {cache_hit_count: 171, read_count: 99847, read_byte: 1.13 GB, read_time: 316.5ms}}} | funcs:count(1)->Column#11              | N/A       | N/A  |
|     └─Selection_12           | 0.00        | 0        | cop[tikv] |               | tikv_task:{proc max:478ms, min:284ms, avg: 338.2ms, p80:351ms, p95:388ms, iters:14656, tasks:33}                                                                                                                                                                                                                                                                      | lt(tpch100.customer.c_acctbal, -10000) | N/A       | N/A  |
|       └─TableFullScan_11     | 15000000.00 | 15000000 | cop[tikv] | table:a       | tikv_task:{proc max:441ms, min:268ms, avg: 311.9ms, p80:325ms, p95:335ms, iters:14656, tasks:33}                                                                                                                                                                                                                                                                      | keep order:false                       | N/A       | N/A  |
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
5 rows in set (11.22 sec)

-- Third run shows no physical reads, total time is 5.79 seconds, 5 seconds less than before. However, the previous run only had additional physical reads, everything else was the same, and the physical read time was 316 milliseconds, much less than 5 seconds.
mysql> explain analyze select /*+ read_from_storage(tikv[a]) */ count(*) from customer a where C_ACCTBAL < -10000;
+------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| id                           | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                | operator info                          | memory    | disk |
+------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| StreamAgg_18                 | 1.00        | 1        | root      |               | time:5.78s, loops:2, RRU:48498.923087, WRU:0.000000                                                                                                                                                                                                                                                           | funcs:count(Column#11)->Column#9       | 388 Bytes | N/A  |
| └─TableReader_19             | 1.00        | 0        | root      |               | time:5.78s, loops:1, cop_task: {num: 33, max: 268ms, min: 126.8ms, avg: 175.2ms, p95: 196ms, max_proc_keys: 650432, p95_proc_keys: 448553, tot_proc: 5.75s, tot_wait: 2.9ms, rpc_num: 33, rpc_time: 5.78s, copr_cache: disabled, build_task_duration: 87.7µs, max_distsql_concurrency: 1}                     | data:StreamAgg_9                       | 357 Bytes | N/A  |
|   └─StreamAgg_9              | 1.00        | 0        | cop[tikv] |               | tikv_task:{proc max:266ms, min:126ms, avg: 174.2ms, p80:187ms, p95:195ms, iters:14656, tasks:33}, scan_detail: {total_process_keys: 15000000, total_process_keys_size: 3052270577, total_keys: 15000033, get_snapshot_time: 1.65ms, rocksdb: {key_skipped_count: 15000000, block: {cache_hit_count: 100018}}} | funcs:count(1)->Column#11              | N/A       | N/A  |
|     └─Selection_12           | 0.00        | 0        | cop[tikv] |               | tikv_task:{proc max:266ms, min:126ms, avg: 173.8ms, p80:187ms, p95:193ms, iters:14656, tasks:33}                                                                                                                                                                                                              | lt(tpch100.customer.c_acctbal, -10000) | N/A       | N/A  |
|       └─TableFullScan_11     | 15000000.00 | 15000000 | cop[tikv] | table:a       | tikv_task:{proc max:231ms, min:105ms, avg: 150.5ms, p80:161ms, p95:174ms, iters:14656, tasks:33}                                                                                                                                                                                                              | keep order:false                       | N/A       | N/A  |
+------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
5 rows in set (5.79 sec)
| username: tidb菜鸟一只 | Original post link

rpc_num, rpc_time: The total number and total time of Cop type RPC requests sent to TiKV.
It looks like most of your time is spent on rpc_time.

| username: 人如其名 | Original post link

Because the time difference is doubled during the full table scan, the RPC time is also doubled, as the RPC time is the total time spent on the TiKV side. However, the scan time is twice as slow, the total number of keys scanned hasn’t changed, and the only difference is the disk I/O. So, it seems that the main reason for the slowdown is the disk I/O. But looking at the read_time in the execution plan, it’s only a few hundred milliseconds, so it doesn’t match up.

| username: tidb菜鸟一只 | Original post link

The average task time for tasks in your TiKV has been reduced by half, right? This is the time saved on caching data, but the time for filtering data cannot be saved.

| username: xfworld | Original post link

Searched through AI capabilities and returned the following:

According to the “TiKV Hotspot Read Issue Troubleshooting” in the official TiKV documentation, you can determine if a large amount of read traffic is causing the CPU to spike on the corresponding TiKV instance through TiKV’s monitoring information. Specifically, you can judge this through the Thread CPU → Unified Read Pool and TiKV → MBps in the TiKV-details monitoring. If there is a significant CPU anomaly in the Unified Read Pool on a particular TiKV instance, and at the same time, there is a noticeable read traffic request in TiKV’s MBps, reaching a peak, then it indicates that the TiKV instance has a hotspot read issue.

Additionally, TiKV provides some statistics on disk reads, such as tikv_import_write_chunk_bytes, tikv_import_write_chunk_duration, tikv_import_upload_chunk_bytes, tikv_import_upload_chunk_duration, and tikv_import_range_delivery_duration. These can be used to understand the disk read situation of TiKV. However, if there is a significant difference between the disk read statistics and the actual execution time, further investigation into the hotspot read issue on the TiKV instance may be necessary.


It feels like AI’s response is still lacking in many judgments.

We need more internal metrics and observational capabilities to help us solve the puzzle.