Issues with Index Selection in Query Plans

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 查询计划中索引选择的问题

| username: EricSong

[TiDB Usage Environment] Test
[TiDB Version] v6.5.0
[Reproduction Path]
Create table as follows

CREATE TABLE `ls` (
  `timestamp` timestamp(3) NULL DEFAULT NULL,
  `organization_id` varchar(61) NOT NULL,
  `cluster_name` varchar(100) DEFAULT NULL,
  `cluster_id` varchar(61) NOT NULL,
  `host_name` varchar(255) DEFAULT NULL,
  `host_ip` varchar(20) DEFAULT NULL,
   ...
  `service_state` float DEFAULT NULL,
  UNIQUE KEY `timestamp` (`timestamp`,`organization_id`,`cluster_id`,`host_name`,`host_ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

The table contains 5GB of data, with more than 30 million rows, and the timestamp spans about 1 year.

[Encountered Problem: Phenomenon and Impact]
As long as the query range is greater than 5 days, it will start scanning the entire table. Only when it is less than 5 days will it use the index.

Less than 5 days
explain SELECT ls.service_state, ls.timestamp FROM ls WHERE ls.timestamp > "2024-04-27 01:32:47" and ls.timestamp < "2024-04-30 01:32:47"
id                         |estRows  |task     |access object                                                                        |operator info                                                            |
---------------------------+---------+---------+-------------------------------------------------------------------------------------+-------------------------------------------------------------------------+
Projection_4               |362228.19|root     |                                                                                     |ls.service_state, ls.timestamp     |
└─IndexLookUp_10           |362228.19|root     |                                                                                     |                                                                         |
  ├─IndexRangeScan_8(Build)|362228.19|cop[tikv]|table:ls, index:timestamp(timestamp, organization_id, cluster_id, host_name, host_ip)|range:(2024-04-27 01:32:47.000,2024-04-30 01:32:47.000), keep order:false|
  └─TableRowIDScan_9(Probe)|362228.19|cop[tikv]|table:ls                                                                             |keep order:false                                                         |
---
Greater than 5 days
explain SELECT ls.service_state, ls.timestamp FROM ls WHERE ls.timestamp > "2024-04-25 01:32:47" and ls.timestamp < "2024-04-30 01:32:47"
id                   |estRows    |task     |access object|operator info                                                                                                                   |
---------------------+-----------+---------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
Projection_4         |629602.92  |root     |             |ls.service_state, ls.timestamp                                                            |
└─TableReader_7      |629602.92  |root     |             |data:Selection_6                                                                                                                |
  └─Selection_6      |629602.92  |cop[tikv]|             |gt(ls.timestamp, 2024-04-25 01:32:47.000000), lt(ls.timestamp, 2024-04-30 01:32:47.000000)|
    └─TableFullScan_5|34892667.00|cop[tikv]|table:ls     |keep order:false                                                                                                                |

The table’s health is 100.
I understand that as long as the number of rows accessed by the index and the table lookup combined does not exceed the total number of rows, theoretically, using the index should be more efficient. Why does it stop using the index after 5 days? Is there some optimization mechanism?

| username: Kongdom | Original post link

How about trying to execute EXPLAIN ANALYZE SELECT?

| username: zhaokede | Original post link

Try looking at the execution plan over a different time interval. Is it possible that the data volume is particularly large during this period, causing a full table scan?

| username: YuchongXU | Original post link

Test several different times and check the execution plan.

| username: TIDB-Learner | Original post link

How about using “between and” for continuous time?

| username: lemonade010 | Original post link

I also encountered such a problem. Using >= and between has the same effect. It is not clear what specific cost conditions cause a full table scan.

| username: tidb菜鸟一只 | Original post link

Try running EXPLAIN ANALYZE to see the execution. It’s possible that the EXPLAIN execution plan is off. Scanning the entire table to filter 6 million out of 30 million rows doesn’t seem right.

| username: yytest | Original post link

Can you set a hint? Specify it?

| username: DBAER | Original post link

Force the use of an index and compare the number of rows.

| username: 友利奈绪 | Original post link

TiDB can force the use of a specific index for comparison.

| username: EricSong | Original post link

Executing EXPLAIN ANALYZE as follows, it looks similar to EXPLAIN:

explain analyze SELECT ls.service_state, ls.timestamp FROM ls WHERE ls.timestamp > "2024-04-27 01:32:47" and ls.timestamp < "2024-04-30 01:32:47"
id                         |estRows  |actRows|task     |access object                                                                        |execution info                                                                                                                                                                                                                                                 |operator info                                                            |memory  |disk|
---------------------------+---------+-------+---------+-------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+--------+----+
Projection_4               |362228.19|381485 |root     |                                                                                     |time:367.7ms, loops:374, Concurrency:5                                                                                                                                                                                                                         |ls.service_state, ls.timestamp     |137.3 KB|N/A |
└─IndexLookUp_10           |362228.19|381485 |root     |                                                                                     |time:366.7ms, loops:374, index_task: {total_time: 334.7ms, fetch_handle: 334.4ms, build: 63µs, wait: 313.3µs}, table_task: {total_time: 554.5ms, num: 23, concurrency: 5}                                                                                      |                                                                         |3.30 MB |N/A |
  ├─IndexRangeScan_8(Build)|362228.19|381485 |cop[tikv]|table:ls, index:timestamp(timestamp, organization_id, cluster_id, host_name, host_ip)|time:289.3ms, loops:379, cop_task: {num: 24, max: 75.2ms, min: 1.64ms, avg: 22.5ms, p95: 61.3ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 487ms, tot_wait: 1ms, rpc_num: 24, rpc_time: 540.6ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 1|range:(2024-04-27 01:32:47.000,2024-04-30 01:32:47.000), keep order:false|N/A     |N/A |
  └─TableRowIDScan_9(Probe)|362228.19|381485 |cop[tikv]|table:ls                                                                             |time:367.3ms, loops:413, cop_task: {num: 37, max: 29.8ms, min: 2.1ms, avg: 12.1ms, p95: 23.8ms, max_proc_keys: 20480, p95_proc_keys: 20480, tot_proc: 372ms, tot_wait: 1ms, rpc_num: 37, rpc_time: 447.8ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15|keep order:false                                                         |N/A     |N/A |
--------------------------------------------------------------------------------
explain analyze SELECT ls.service_state, ls.timestamp FROM ls WHERE ls.timestamp > "2024-04-25 01:32:47" and ls.timestamp < "2024-04-30 01:32:47"
id                   |estRows    |actRows |task     |access object|execution info                                                                                                                                                                                                                                                 |operator info                                                                                                                   |memory  |disk|
---------------------+-----------+--------+---------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+--------+----+
Projection_4         |629602.92  |640142  |root     |             |time:1.94s, loops:630, Concurrency:5                                                                                                                                                                                                                           |ls.service_state, ls.timestamp                                                            |129.2 KB|N/A |
└─TableReader_7      |629602.92  |640142  |root     |             |time:1.94s, loops:630, cop_task: {num: 97, max: 521.9ms, min: 868.1µs, avg: 271.6ms, p95: 477.4ms, max_proc_keys: 554792, p95_proc_keys: 521241, tot_proc: 25.8s, tot_wait: 342ms, rpc_num: 97, rpc_time: 26.3s, copr_cache_hit_ratio: 0.00, distsql_concurrenc|data:Selection_6                                                                                                                |1.15 MB |N/A |
  └─Selection_6      |629602.92  |640142  |cop[tikv]|             |tikv_task:{proc max:505ms, min:0s, avg: 266.2ms, p80:379ms, p95:464ms, iters:34498, tasks:97}, scan_detail: {total_process_keys: 34892667, total_process_keys_size: 7457693379, total_keys: 34892764, get_snapshot_time: 2.78ms, rocksdb: {key_skipped_count: 3|gt(ls.timestamp, 2024-04-25 01:32:47.000000), lt(ls.timestamp, 2024-04-30 01:32:47.000000)|N/A     |N/A |
    └─TableFullScan_5|34892667.00|34892667|cop[tikv]|table:ls     |tikv_task:{proc max:408ms, min:0s, avg: 192.7ms, p80:276ms, p95:348ms, iters:34498, tasks:97}                                                                                                                                                                  |keep order:false                                                                                                                |N/A     |N/A |
| username: EricSong | Original post link

Specifying OptimizeHint can force the use of an index, but I’m curious why a full table scan was still used when the number of index rows plus the number of write-back rows is less than the number of full table scan rows. I found some information on the cost-model, but I still don’t understand why the cost of the number of index rows plus the number of write-back rows would be greater than a full table scan.

| username: EricSong | Original post link

This is the result of forcing the index. It seems that the total number of scanned rows is still smaller when the index is forced, but I don’t know why it doesn’t automatically use the index.

explain analyze SELECT /*+ USE_INDEX(ls, timestamp) */ls.service_state, ls.timestamp FROM ls WHERE ls.timestamp > "2024-04-25 01:32:47" and ls.timestamp < "2024-04-30 01:32:47"
id                         |estRows  |actRows|task     |access object                                                                        |execution info                                                                                                                                                                                                                                                 |operator info                                                            |memory  |disk|
---------------------------+---------+-------+---------+-------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+--------+----+
Projection_4               |629602.92|640142 |root     |                                                                                     |time:495.7ms, loops:627, Concurrency:5                                                                                                                                                                                                                         |ls.service_state, ls.timestamp     |137.4 KB|N/A |
└─IndexLookUp_7            |629602.92|640142 |root     |                                                                                     |time:494.4ms, loops:627, index_task: {total_time: 472.1ms, fetch_handle: 435.6ms, build: 82.7µs, wait: 36.4ms}, table_task: {total_time: 901.1ms, num: 35, concurrency: 5}                                                                                     |                                                                         |3.33 MB |N/A |
  ├─IndexRangeScan_5(Build)|629602.92|640142 |cop[tikv]|table:ls, index:timestamp(timestamp, organization_id, cluster_id, host_name, host_ip)|time:366.3ms, loops:631, cop_task: {num: 37, max: 77.8ms, min: 394.2µs, avg: 14.5ms, p95: 67.9ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 481ms, rpc_num: 37, rpc_time: 536.5ms, copr_cache_hit_ratio: 0.27, distsql_concurrency: 15}, tikv_task:|range:(2024-04-25 01:32:47.000,2024-04-30 01:32:47.000), keep order:false|N/A     |N/A |
  └─TableRowIDScan_6(Probe)|629602.92|640142 |cop[tikv]|table:ls                                                                             |time:631.5ms, loops:679, cop_task: {num: 46, max: 30ms, min: 1.58ms, avg: 15ms, p95: 27.6ms, max_proc_keys: 20480, p95_proc_keys: 20480, tot_proc: 584ms, tot_wait: 1ms, rpc_num: 46, rpc_time: 688.3ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}, |keep order:false                                                         |N/A     |N/A |
------------------------------------------
explain analyze SELECT ls.service_state, ls.timestamp FROM ls WHERE ls.timestamp > "2024-04-25 01:32:47" and ls.timestamp < "2024-04-30 01:32:47"
id                   |estRows    |actRows |task     |access object|execution info                                                                                                                                                                                                                                                 |operator info                                                                                                                   |memory  |disk|
---------------------+-----------+--------+---------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+--------+----+
Projection_4         |629602.92  |640142  |root     |             |time:814.8ms, loops:630, Concurrency:5                                                                                                                                                                                                                         |ls.service_state, ls.timestamp                                                            |128.9 KB|N/A |
└─TableReader_7      |629602.92  |640142  |root     |             |time:812.6ms, loops:630, cop_task: {num: 97, max: 556.6ms, min: 772µs, avg: 88.5ms, p95: 471.2ms, max_proc_keys: 554792, p95_proc_keys: 485769, tot_proc: 8.36s, tot_wait: 40ms, rpc_num: 97, rpc_time: 8.58s, copr_cache_hit_ratio: 0.52, distsql_concurrency:|data:Selection_6                                                                                                                |1.15 MB |N/A |
  └─Selection_6      |629602.92  |640142  |cop[tikv]|             |tikv_task:{proc max:555ms, min:1ms, avg: 280.4ms, p80:411ms, p95:478ms, iters:34498, tasks:97}, scan_detail: {total_process_keys: 9781457, total_process_keys_size: 2115995946, total_keys: 9781504, get_snapshot_time: 62.7ms, rocksdb: {key_skipped_count: 97|gt(ls.timestamp, 2024-04-25 01:32:47.000000), lt(ls.timestamp, 2024-04-30 01:32:47.000000)|N/A     |N/A |
    └─TableFullScan_5|34892667.00|34892667|cop[tikv]|table:ls     |tikv_task:{proc max:449ms, min:1ms, avg: 205.4ms, p80:305ms, p95:370ms, iters:34498, tasks:97}                                                                                                                                                                 |keep order:false                                                                                                                |N/A     |N/A |
| username: EricSong | Original post link

It seems indeed related to the data within the time range, but I don’t understand why this is the case. Even if the number of rows indexed plus written back is an order of magnitude smaller than the number of rows scanned in the full table, is the actual time taken still higher for the former?

| username: xfworld | Original post link

Is the data in this table changing significantly?
I observed that when scanning the entire table, the number of keys scanned is not much different from the total number of keys…
total_process_keys: 34,892,667, total_keys: 34,892,770,

It looks like the timestamp is a hotspot index… It’s not effective.
You might need to try a different way to create the index… You should give it a try.

| username: zhaokede | Original post link

In terms of business, can we switch to a more efficient index or create a new composite index?

| username: EricSong | Original post link

But what puzzles me is that if you force the use of the index, you can actually get a good result.

This is the result of forcing the use of the index. It seems that the total number of scanned rows is still smaller when the index is forced, but I don’t know why it doesn’t automatically use the index. It might be because the index is not built solely on the timestamp or other reasons. TiDB might have canceled it after cost evaluation, but I still don’t understand how scanning the entire table can be more optimal even if it’s off by one or even two orders of magnitude?

explain analyze SELECT /*+ USE_INDEX(ls, timestamp) */ls.service_state, ls.timestamp FROM ls WHERE ls.timestamp > "2024-04-25 01:32:47" and ls.timestamp < "2024-04-30 01:32:47"
id                         |estRows  |actRows|task     |access object                                                                        |execution info                                                                                                                                                                                                                                                 |operator info                                                            |memory  |disk|
---------------------------+---------+-------+---------+-------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+--------+----+
Projection_4               |629602.92|640142 |root     |                                                                                     |time:495.7ms, loops:627, Concurrency:5                                                                                                                                                                                                                         |ls.service_state, ls.timestamp     |137.4 KB|N/A |
└─IndexLookUp_7            |629602.92|640142 |root     |                                                                                     |time:494.4ms, loops:627, index_task: {total_time: 472.1ms, fetch_handle: 435.6ms, build: 82.7µs, wait: 36.4ms}, table_task: {total_time: 901.1ms, num: 35, concurrency: 5}                                                                                     |                                                                         |3.33 MB |N/A |
  ├─IndexRangeScan_5(Build)|629602.92|640142 |cop[tikv]|table:ls, index:timestamp(timestamp, organization_id, cluster_id, host_name, host_ip)|time:366.3ms, loops:631, cop_task: {num: 37, max: 77.8ms, min: 394.2µs, avg: 14.5ms, p95: 67.9ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 481ms, rpc_num: 37, rpc_time: 536.5ms, copr_cache_hit_ratio: 0.27, distsql_concurrency: 15}, tikv_task:|range:(2024-04-25 01:32:47.000,2024-04-30 01:32:47.000), keep order:false|N/A     |N/A |
  └─TableRowIDScan_6(Probe)|629602.92|640142 |cop[tikv]|table:ls                                                                             |time:631.5ms, loops:679, cop_task: {num: 46, max: 30ms, min: 1.58ms, avg: 15ms, p95: 27.6ms, max_proc_keys: 20480, p95_proc_keys: 20480, tot_proc: 584ms, tot_wait: 1ms, rpc_num: 46, rpc_time: 688.3ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}, |keep order:false                                                         |N/A     |N/A |
------------------------------------------
explain analyze SELECT ls.service_state, ls.timestamp FROM ls WHERE ls.timestamp > "2024-04-25 01:32:47" and ls.timestamp < "2024-04-30 01:32:47"
id                   |estRows    |actRows |task     |access object|execution info                                                                                                                                                                                                                                                 |operator info                                                                                                                   |memory  |disk|
---------------------+-----------+--------+---------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+--------+----+
Projection_4         |629602.92  |640142  |root     |             |time:814.8ms, loops:630, Concurrency:5                                                                                                                                                                                                                         |ls.service_state, ls.timestamp                                                            |128.9 KB|N/A |
└─TableReader_7      |629602.92  |640142  |root     |             |time:812.6ms, loops:630, cop_task: {num: 97, max: 556.6ms, min: 772µs, avg: 88.5ms, p95: 471.2ms, max_proc_keys: 554792, p95_proc_keys: 485769, tot_proc: 8.36s, tot_wait: 40ms, rpc_num: 97, rpc_time: 8.58s, copr_cache_hit_ratio: 0.52, distsql_concurrency:|data:Selection_6                                                                                                                |1.15 MB |N/A |
  └─Selection_6      |629602.92  |640142  |cop[tikv]|             |tikv_task:{proc max:555ms, min:1ms, avg: 280.4ms, p80:411ms, p95:478ms, iters:34498, tasks:97}, scan_detail: {total_process_keys: 9781457, total_process_keys_size: 2115995946, total_keys: 9781504, get_snapshot_time: 62.7ms, rocksdb: {key_skipped_count: 97|gt(ls.timestamp, 2024-04-25 01:32:47.000000), lt(ls.timestamp, 2024-04-30 01:32:47.000000)|N/A     |N/A |
    └─TableFullScan_5|34892667.00|34892667|cop[tikv]|table:ls     |tikv_task:{proc max:449ms, min:1ms, avg: 205.4ms, p80:305ms, p95:370ms, iters:34498, tasks:97}                                                                                                                                                                 |keep order:false                                                                                                                |N/A     |N/A |
| username: EricSong | Original post link

This is possible, but I’m not sure why it’s not using the index when it’s clearly more advantageous to do so, and the health is also 100.

| username: 不想干活 | Original post link

Is it caused by a large amount of data being processed?

| username: xfworld | Original post link

If you have the chance, try upgrading to version 6.5.9 and see if it works.