Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: min(id)为什么走全表扫描
[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed to encounter the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
The table structure is:
Create Table: CREATE TABLE `data` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`req_id` char(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`source_from` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'Data Source',
`source_type` varchar(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '1' COMMENT 'Data Type',
`create_at` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'Creation Time',
`update_at` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT 'Update Time',
`upload_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Upload Time',
PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
KEY `idx_created_at` (`create_at`),
KEY `idx_updated_at` (`update_at`),
KEY `i_req_id` (`req_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=3256369687
1 row in set (0.00 sec)
The query is:
SELECT
min(id)
FROM
data;
Execution Plan:
mysql> explain SELECT max(id) FROM data;
+------------------------------+---------+-----------+-----------------+----------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+-----------------+----------------------------------------------+
| StreamAgg_9 | 1.00 | root | | funcs:max(order.data.id)->Column#12 |
| └─Limit_13 | 1.00 | root | | offset:0, count:1 |
| └─TableReader_24 | 1.00 | root | | data:Limit_23 |
| └─Limit_23 | 1.00 | cop[tikv] | | offset:0, count:1 |
| └─TableFullScan_22 | 1.00 | cop[tikv] | table:data | keep order:true, desc |
+------------------------------+---------+-----------+-----------------+----------------------------------------------+
5 rows in set (0.00 sec)
In reality, it’s not a full table scan; it’s just that the operator shows TableFullScan (because there’s no other operator to represent a table scan ). In fact, it stops once the maximum value is obtained. From the execution plan, you can see that there is a limit operator with count:1 above, and the TableFullScan has keep order: true.
For min/max optimization, refer to: Max/Min 函数消除规则 | PingCAP 文档中心
During logical optimization, SQL will be rewritten, and min will be changed to order by id limit 1.
Under normal circumstances, the index column can be used. How many rows of data does this table have?
However, the execution time is very long, over 40 seconds.
Take a look at the actual plan with explain analyze.
Take out the actual execution plan.
The execution plan shows that the number of rows written is 1. The clustered table is ordered, so you can directly fetch the smallest value from the table. If the execution is slow, there might be an issue. Have you deleted a large amount of data?
Use EXPLAIN ANALYZE SELECT max(id) FROM data;
to check the actual execution plan.
Isn’t the number of affected rows always 1 row?
It is indeed abnormal for this amount of data to take more than 40 seconds. Could it be a read hotspot issue? Try changing the primary key type by modifying id
bigint(20) unsigned NOT NULL AUTO_INCREMENT to AUTO_RANDOM.
+------------------------------+---------+---------+-----------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+------------------------------+---------+---------+-----------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+-----------+------+
| StreamAgg_9 | 1.00 | 1 | root | | time:1m0.7s, loops:2 | funcs:min(order.data.id)->Column#12 | 160 Bytes | N/A |
| └─Limit_13 | 1.00 | 1 | root | | time:1m0.7s, loops:2 | offset:0, count:1 | N/A | N/A |
| └─TableReader_24 | 1.00 | 1 | root | | time:1m0.7s, loops:1, cop_task: {num: 221, max: 491.7ms, min: 584.9µs, avg: 274.4ms, p95: 419.9ms, max_proc_keys: 1, p95_proc_keys: 0, tot_proc: 1m0.3s, tot_wait: 23ms, rpc_num: 221, rpc_time: 1m0.6s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 1} | data:Limit_23 | 547 Bytes | N/A |
| └─Limit_23 | 1.00 | 1 | cop[tikv] | | tikv_task:{proc max:491ms, min:0s, avg: 273.5ms, p80:409ms, p95:419ms, iters:221, tasks:221}, scan_detail: {total_process_keys: 1, total_process_keys_size: 27, total_keys: 153800520, get_snapshot_time: 108ms, rocksdb: {delete_skipped_count: 14402724, key_skipped_count: 168203023, block: {cache_hit_count: 2191, read_count: 240664, read_byte: 2.66 GB, read_time: 11.2s}}} | offset:0, count:1 | N/A | N/A |
| └─TableFullScan_22 | 1.00 | 1 | cop[tikv] | table:data | tikv_task:{proc max:491ms, min:0s, avg: 273.5ms, p80:409ms, p95:419ms, iters:221, tasks:221} | keep order:true | N/A | N/A |
+------------------------------+---------+---------+-----------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+-----------+------+
5 rows in set (1 min 0.77 sec)
There is a pt-archiver tool running, deleting expired data. The first run was very fast, and subsequent runs are also very fast.
Without a WHERE condition, a direct FROM t will result in either a table full scan or an index full scan.
CLUSTERED
means that the primary key of the table is a clustered index. In a clustered index table, the key of the row data is composed of the primary key column data provided by the user, so a clustered index table stores at least one key-value pair per row, i.e.,
- Primary key column data (key) - Row data (value)
You have created a clustered index table. There is no separate primary key index; the primary key is row_id. Therefore, it will directly perform a table full scan.
Well, but the execution time is long. What is the reason?
Let’s see a complete execution plan.
rpc_time: 1 minute 0.6 seconds
How about changing distsql_concurrency to 15?