Will using LIMIT in TiDB queries have optimization effects? It was found that even after using LIMIT, a large number of keys were still scanned

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB查询使用limit,会有优化效果吗,发现使用limit之后,还是扫描了非常多的key

| username: TiDBer_C33

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] v4.0.10
[Reproduction Path] What operations were performed that caused the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]
The SQL has an index, and the execution plan also uses the index.
image

| username: TiDBer_C33 | Original post link

SQL is limit 100

| username: DBAER | Original post link

Normally, it should push down the filtering to the TiKV layer, right?

| username: TiDBer_C33 | Original post link

Yes, looking at the execution plan, the limit should be applied at the KV layer. I’m not sure why it still scans so many keys.

| username: TiDBer_0p0BD6le | Original post link

Which aspect of optimization are you referring to? LIMIT itself has pushdown optimization, but this is not a means of SQL optimization. Even with LIMIT, a lot of data may still be scanned.

| username: zhaokede | Original post link

It is also related to the query conditions.

| username: 小龙虾爱大龙虾 | Original post link

Send the execution plan text.

| username: shigp_TIDBER | Original post link

Post the execution plan and let’s take a look.

| username: tidb菜鸟一只 | Original post link

Please provide the entire SQL and execution plan, otherwise it’s impossible to make a judgment. For example, in the query SELECT a FROM t WHERE b > 10 LIMIT 100, if there’s no index on column b, wouldn’t it have to scan the entire table to find 100 rows where b > 10? In that case, how much data it scans would entirely depend on how your data is stored, right?

| username: TiDBer_RjzUpGDL | Original post link

Is there an index on the query condition?

| username: Soysauce520 | Original post link

Is the line above this one in the execution plan a TopN?

| username: xfworld | Original post link

Upgrade it, the version is too old…

The old version has limited support for operators.

| username: hacker_77powerful | Original post link

If you add “order by xx limit 100”, it will still scan many rows.
It mainly depends on the SQL statement you execute.

| username: Jellybean | Original post link

TiDB uses limit, which in most cases will be pushed down to the storage layer for filtering to improve query performance. However, in some scenarios, to ensure the accuracy of the results, it will not be pushed down in advance.

Please post your specific SQL and execution plan so that we can further analyze it specifically.

| username: 友利奈绪 | Original post link

You need to look at the specific execution plan.

| username: madcoder | Original post link

Let’s all guess the riddle together :joy:

| username: TIDB-Learner | Original post link

If the query condition has no index, add an index and use >= instead of >. If the index is sorted, do not sort the query in the reverse order. It is better to clearly state the query statement, table structure, table data, and other information, and send the result of the explain table.

| username: TiDBer_C33 | Original post link

CREATE TABLE xxxxxx (
id bigint(20) NOT NULL AUTO_INCREMENT COMMENT ‘id’,
uid bigint(20) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
people_task_id int(11) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
task_group_id int(11) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
task_id int(11) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
sub_task_id int(11) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
sub_task_type int(11) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
cycle_id int(11) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
progress_compute_end_time bigint(20) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
next_compute_time bigint(20) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
is_deleted tinyint(4) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
ctime datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘xx’,
mtime datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘xx’,
gid bigint(20) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
crowd_dimension tinyint(4) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
last_compute_time bigint(20) NOT NULL DEFAULT ‘0’ COMMENT ‘xx’,
PRIMARY KEY (id),
KEY idx_next_compute_time (next_compute_time),
KEY idx_progress_compute_end_time (progress_compute_end_time),
KEY idx_uid (uid),
KEY idx_people_task_id (people_task_id),
KEY idx_mtime (mtime),
UNIQUE KEY uk_uid_sub_task_id_cycle_id_gid (uid,sub_task_id,cycle_id,gid),
KEY idx_last_next_compute_time (last_compute_time,next_compute_time),
KEY idx_sub_task_type_last_next_compute_time (sub_task_type,last_compute_time,next_compute_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=14700001 COMMENT=‘xx’

| username: TiDBer_C33 | Original post link

SELECT `id`,`uid`,`gid`,`crowd_dimension`,`people_task_id`,`task_group_id`,`task_id`,`sub_task_id`,`sub_task_type`,`cycle_id`,`progress_compute_end_time`,`next_compute_time`,`is_deleted`,`ctime`,`mtime`,`last_compute_time` 
FROM `anchor_task_center_progress_schedule_execute_info_0000` 
WHERE ((`last_compute_time`=0 AND `next_compute_time`>=1712825580) AND `next_compute_time`<=1716281580) 
AND `sub_task_type` IN (46,47,48,49,50,51,52,53,54,55,56,57,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,76,77,78) 
LIMIT 100;
| username: TiDBer_C33 | Original post link

Index used: idx_sub_task_type_last_next_compute_time