Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 通用场景下tikv的copTask在内存中扫描数据太慢,比mysql、pg等都慢
[TiDB Usage Environment] Poc
[TiDB Version] 6.6
[Reproduction Path] Perform a full table scan to observe the behavior of copTask in TiKV
[Encountered Problem: Phenomenon and Impact] copTask scans data too slowly
[Resource Configuration] All operations are single-threaded and executed in memory, not involving resource usage
In general scenarios, it is found that scanning data in TiKV is slower compared to MySQL or PostgreSQL. If it is due to the LSM tree, other databases based on LSM tree do not perform worse than MySQL in scanning. In actual tests, it is found that the full table scan capability of a single copTask in TiKV is 1-2 times slower than single-threaded MySQL and about 5 times slower than single-threaded PostgreSQL. I would like to ask why TiKV scans are so slow and whether there is room for optimization in the future?
Here, since all reads are from memory, the actual block size setting is not very significant, but it is provided here for reference. The test statement is: select sum(C_NATIONKEY) from customer;
and it performs a full table scan. The table structure for all databases is:
mysql> show create table customer \G
*************************** 1. row ***************************
Table: customer
Create Table: CREATE TABLE `customer` (
`C_CUSTKEY` bigint NOT NULL,
`C_NAME` varchar(25) NOT NULL,
`C_ADDRESS` varchar(40) NOT NULL,
`C_NATIONKEY` bigint NOT NULL,
`C_PHONE` char(15) NOT NULL,
`C_ACCTBAL` decimal(15,2) NOT NULL,
`C_MKTSEGMENT` char(10) NOT NULL,
`C_COMMENT` varchar(117) NOT NULL,
PRIMARY KEY (`C_CUSTKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
TiDB related tests are as follows:
mysql> select version();
+--------------------+
| version() |
+--------------------+
| 5.7.25-TiDB-v6.6.0 |
+--------------------+
1 row in set (0.00 sec)
mysql> show config where name like 'rocksdb%block-size%';
+------+----------------------+------------------------------+-------+
| Type | Instance | Name | Value |
+------+----------------------+------------------------------+-------+
| tikv | 192.168.31.201:20160 | rocksdb.defaultcf.block-size | 32KiB |
| tikv | 192.168.31.201:20160 | rocksdb.lockcf.block-size | 16KiB |
| tikv | 192.168.31.201:20160 | rocksdb.raftcf.block-size | 16KiB |
| tikv | 192.168.31.201:20160 | rocksdb.writecf.block-size | 32KiB |
+------+----------------------+------------------------------+-------+
4 rows in set (0.01 sec)
-- Setting to 1 here is to facilitate the timing of a single copTask, with a single copTask consuming 100% CPU (one CPU)
mysql> set tidb_distsql_scan_concurrency=1;
Query OK, 0 rows affected (0.00 sec)
mysql> explain analyze select sum(C_NATIONKEY) from customer;
+----------------------------+------------+---------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+----------------------------+------------+---------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+-----------+------+
| StreamAgg_16 | 1.00 | 1 | root | | time:754.7ms, loops:2 | funcs:sum(Column#11)->Column#9 | 1.45 KB | N/A |
| └─TableReader_17 | 1.00 | 3 | root | | time:754.7ms, loops:2, cop_task: {num: 3, max: 306.3ms, min: 169ms, avg: 251.4ms, p95: 306.3ms, max_proc_keys: 603265, p95_proc_keys: 603265, tot_proc: 751ms, rpc_num: 3, rpc_time: 754.2ms, copr_cache: disabled, build_task_duration: 12.4µs, max_distsql_concurrency: 1} | data:StreamAgg_8 | 442 Bytes | N/A |
| └─StreamAgg_8 | 1.00 | 3 | cop[tikv] | | tikv_task:{proc max:305ms, min:168ms, avg: 250.3ms, p80:305ms, p95:305ms, iters:1466, tasks:3}, scan_detail: {total_process_keys: 1500000, total_process_keys_size: 305225771, total_keys: 1500003, get_snapshot_time: 76.4µs, rocksdb: {key_skipped_count: 1500000, block: {cache_hit_count: 10004}}} | funcs:sum(tpch.customer.c_nationkey)->Column#11 | N/A | N/A |
| └─TableFullScan_15 | 1500000.00 | 1500000 | cop[tikv] | table:customer | tikv_task:{proc max:246ms, min:132ms, avg: 204ms, p80:246ms, p95:246ms, iters:1466, tasks:3} | keep order:false | N/A | N/A |
+----------------------------+------------+---------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+-----------+------+
4 rows in set (0.76 sec)
From the execution plan information, total_process_keys: 1500000, total_process_keys_size: 305225771, total_keys: 1500003
, it can be seen that all scanned data almost does not contain un-GC’d data, all are valid read data. From rocksdb: {key_skipped_count: 1500000, block: {cache_hit_count: 10004}}
, it can be seen that all are logical reads (memory reads), with a count of 10004. The table scan operator is: TableFullScan_15
, a total of 3 tasks were executed, with each task taking an average of avg: 204ms
, so the total time for the table scan is 612 milliseconds.
MySQL related tests are as follows:
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.32 |
+-----------+
1 row in set (0.00 sec)
mysql> show variables like '%innodb_page_size%';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_page_size | 16384 |
+------------------+-------+
1 row in set (0.00 sec)
mysql> show status like 'Innodb_data_reads';
+-------------------+---------+
| Variable_name | Value |
+-------------------+---------+
| Innodb_data_reads | 3710851 |
+-------------------+---------+
1 row in set (0.01 sec)
mysql> show status like 'Innodb_buffer_pool_read_requests';
+----------------------------------+-----------+
| Variable_name | Value |
+----------------------------------+-----------+
| Innodb_buffer_pool_read_requests | 414272227 |
+----------------------------------+-----------+
1 row in set (0.00 sec)
mysql> explain analyze select sum(C_NATIONKEY) from customer;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Aggregate: sum(customer.C_NATIONKEY) (cost=234307.26 rows=1) (actual time=334.921..334.922 rows=1 loops=1)
-> Table scan on customer (cost=120530.06 rows=1137772) (actual time=0.097..237.381 rows=1500000 loops=1)
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.33 sec)
mysql> show status like 'Innodb_data_reads';
+-------------------+---------+
| Variable_name | Value |
+-------------------+---------+
| Innodb_data_reads | 3710851 |
+-------------------+---------+
1 row in set (0.00 sec)
mysql> show status like 'Innodb_buffer_pool_read_requests';
+----------------------------------+-----------+
| Variable_name | Value |
+----------------------------------+-----------+
| Innodb_buffer_pool_read_requests | 414310612 |
+----------------------------------+-----------+
1 row in set (0.00 sec)
From the Innodb_data_reads
indicator before and after the statement execution, it can be seen that it is still 3710851, indicating no physical reads occurred. From the Innodb_buffer_pool_read_requests
indicator before and after the statement execution, it can be seen that the number of logical reads is: 414310612 - 414272227 = **38385**
. From the execution information: Table scan on customer (cost=120530.06 rows=1137772) (actual time=0.097..237.381 rows=1500000 loops=1)
, it can be seen that the actual execution time for the table scan is 237.381 milliseconds.
PostgreSQL related tests are as follows:
tpch=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 15.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
tpch=# set max_parallel_workers_per_gather=0;
SET
tpch=# set track_io_timing=on;
SET
tpch=# SELECT current_setting('block_size');
current_setting
-----------------
8192
(1 row)
tpch=# explain (analyze true,buffers true,verbose true,timing) select sum(C_NATIONKEY) from customer;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=56484.90..56484.91 rows=1 width=32) (actual time=234.525..234.526 rows=1 loops=1)
Output: sum(c_nationkey)
Buffers: shared hit=37734
-> Seq Scan on public.customer (cost=0.00..52734.72 rows=1500072 width=8) (actual time=0.012..111.046 rows=1500000 loops=1)
Output: c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment
Buffers: shared hit=37734
Planning Time: 0.060 ms
Execution Time: 234.556 ms
(8 rows)
From the execution information, it can be seen that all are logical reads with a total of 37734 times. From the table scan here: Seq Scan on public.customer (cost=0.00..52734.72 rows=1500072 width=8) (actual time=0.012..111.046 rows=1500000 loops=1)
, it can be seen that the single-threaded table scan took 111.046 milliseconds.
When parallelism is enabled (e.g., TiDB default distsql_scan_concurrency=15
), other databases can also achieve good acceleration, almost linear with sufficient CPU. Some hybrid row-column databases perform even better. Longer full memory scan times mean longer CPU time, so more resources are needed for the same processing volume. Therefore, I would like to ask why the logical read time for copTask scans in TiKV is so long for the same data volume? Is there room for optimization in the future?