Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: id排序引起查询慢的问题
[TiDB Usage Environment] Production Environment / Testing / PoC
Production Environment
[TiDB Version]
6.5.0
[Reproduction Path] What operations were performed to encounter the issue
A table with 5 million rows, sorting by id in ascending order is 3 times slower than not sorting. Is there a way to solve this?
[Encountered Issue: Problem Phenomenon and Impact]
SELECT * from mytable order by id asc limit 5000000,10
Takes 3 seconds
SELECT * from mytable order by id desc limit 5000000,10
Takes 3.5 seconds
SELECT * from ai_gis_tilt limit 5000000,10
Takes 1.1 seconds
Is there a way to speed up the id desc query?
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
This is related to your offset.
Is there an index on the ID?
Sorting without using ID is faster because it eliminates the sorting operation. You can check the specific differences in the execution plan.
Add an index on the id. If the returned columns are fixed, you can create a composite index.
The id is the primary key. Of course, there is an index. Even if I add another unique index, the effect is the same.
Here is the explain:
Limit_13 |
10.00 |
root |
|
offset:5000000, count:10 |
└─TableReader_23 |
5000010.00 |
root |
|
data:TableFullScan_22 |
└─TableFullScan_22 |
5000010.00 |
cop[tikv] |
table:my_table |
keep order:true, desc |
Explanation when not sorted:
└─TableReader_11 |
5000010.00 |
root |
|
data:Limit_10 |
└─Limit_10 |
5000010.00 |
cop[tikv] |
|
offset:0, count:5000010 |
└─TableFullScan_9 |
5000010.00 |
cop[tikv] |
table:my_table |
keep order:false |
Comparing the above results, it can be seen that an order is missing. |
|
|
|
|
The id is the primary key and should have a default index (a unique index was also added later). Can it be optimized?
Please execute the three statements using EXPLAIN ANALYZE
to see the results.
What I mean is that if the columns you return are fixed and not many, you can create a composite index. For example, for the query SELECT id, zhibo_date FROM paiban ORDER BY id DESC LIMIT 500000, 10;
, you can create an index like CREATE INDEX idx_paiban_id_zhibo_date ON paiban(id, zhibo_date);
. This kind of index will be much better. If you query all columns, even though your id column has an index, querying other columns will still require a table scan.
The ID is the primary key. Looking at the execution plan, a table scan occurs, which is compact and does not require a back table lookup.
When retrieving data while maintaining order, i.e., when the execution plan has keep order: true, the concurrency of cop_task for data retrieval is 2 (hardcoded in the code). This is to avoid excessive memory consumption in TiDB due to data accumulation caused by waiting for ordered data. If data order is not guaranteed, in scenarios with limit M, N, if M is large enough, tidb_distsql_scan_concurrency can be used to concurrently retrieve data, which will be more efficient. Therefore, when the values of limit M, N are relatively small, the efficiency should be similar, but when the values are relatively large, there will be some efficiency differences, mainly due to the concurrency issue. There is no way to improve the performance of such statements.