The issue of slow queries caused by ID sorting

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: id排序引起查询慢的问题

| username: TiDBer_dTajNAyh

[TiDB Usage Environment] Production Environment / Testing / PoC
Production Environment
[TiDB Version]
6.5.0

[Reproduction Path] What operations were performed to encounter the issue
A table with 5 million rows, sorting by id in ascending order is 3 times slower than not sorting. Is there a way to solve this?

[Encountered Issue: Problem Phenomenon and Impact]

SELECT * from mytable order by id asc limit 5000000,10
Takes 3 seconds

SELECT * from mytable order by id desc limit 5000000,10
Takes 3.5 seconds

SELECT * from ai_gis_tilt limit 5000000,10
Takes 1.1 seconds

Is there a way to speed up the id desc query?

[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

| username: songxuecheng | Original post link

This is related to your offset.

| username: weixiaobing | Original post link

Is there an index on the ID?
Sorting without using ID is faster because it eliminates the sorting operation. You can check the specific differences in the execution plan.

| username: tidb菜鸟一只 | Original post link

Add an index on the id. If the returned columns are fixed, you can create a composite index.

| username: TiDBer_dTajNAyh | Original post link

The id is the primary key. Of course, there is an index. Even if I add another unique index, the effect is the same.

Here is the explain:

Limit_13 10.00 root offset:5000000, count:10
└─TableReader_23 5000010.00 root data:TableFullScan_22
└─TableFullScan_22 5000010.00 cop[tikv] table:my_table keep order:true, desc
| username: TiDBer_dTajNAyh | Original post link

Explanation when not sorted:

└─TableReader_11 5000010.00 root data:Limit_10
└─Limit_10 5000010.00 cop[tikv] offset:0, count:5000010
└─TableFullScan_9 5000010.00 cop[tikv] table:my_table keep order:false
Comparing the above results, it can be seen that an order is missing.

The id is the primary key and should have a default index (a unique index was also added later). Can it be optimized?

| username: weixiaobing | Original post link

Please execute the three statements using EXPLAIN ANALYZE to see the results.

| username: tidb菜鸟一只 | Original post link

What I mean is that if the columns you return are fixed and not many, you can create a composite index. For example, for the query SELECT id, zhibo_date FROM paiban ORDER BY id DESC LIMIT 500000, 10;, you can create an index like CREATE INDEX idx_paiban_id_zhibo_date ON paiban(id, zhibo_date);. This kind of index will be much better. If you query all columns, even though your id column has an index, querying other columns will still require a table scan.

| username: 人如其名 | Original post link

The ID is the primary key. Looking at the execution plan, a table scan occurs, which is compact and does not require a back table lookup.

| username: 人如其名 | Original post link

When retrieving data while maintaining order, i.e., when the execution plan has keep order: true, the concurrency of cop_task for data retrieval is 2 (hardcoded in the code). This is to avoid excessive memory consumption in TiDB due to data accumulation caused by waiting for ordered data. If data order is not guaranteed, in scenarios with limit M, N, if M is large enough, tidb_distsql_scan_concurrency can be used to concurrently retrieve data, which will be more efficient. Therefore, when the values of limit M, N are relatively small, the efficiency should be similar, but when the values are relatively large, there will be some efficiency differences, mainly due to the concurrency issue. There is no way to improve the performance of such statements.