Scan Performance Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: scan性能问题

| username: TiDBer_i3pbMJ65

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed to encounter the issue
[Encountered Issue: Problem Phenomenon and Impact]
Two range queries return the same data, one with an end range and one without, but the latency difference is huge.

  1. Query range (start, end) returns quickly

(1) There are only 10 pieces of data in this range;
(2) obj_key is the primary key;
(3) The primary key is clustered.

  1. Query range (start, unspecified) returns slowly

    (1) There are 1 billion pieces of data in this range, but limit to 10 pieces;
    (2) obj_key is the primary key;
    (3) The primary key is clustered.

The starting range is the same in both scenarios, and the returned data is the same.
Multiple queries yield the same result (excluding memory hit differences).
Why is there such a significant difference in latency?

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: tidb菜鸟一只 | Original post link

The larger the return, the more data will be scanned. The limit only restricts the amount of data returned, not the amount of data scanned.

| username: FutureDB | Original post link

Looking at the two execution plans, one has an execution time close to 1ms and the other 1.5ms. The difference isn’t that significant, right?

| username: Kongdom | Original post link

The limit is applied at the TiDB layer during aggregation with limit 10, not at each TiKV node where only 10 rows are fetched. Since it is distributed, if each TiKV node fetches 10 rows, the aggregation at the TiDB layer might result in fewer than 10 rows.

| username: TiDBer_H5NdJb5Q | Original post link

I don’t quite understand this. Could you please explain it again?

| username: Kongdom | Original post link

For example, in a standard deployment with 3 TiKV nodes, 90 pieces of data are distributed across the 3 nodes: 1-30 on node A, 31-60 on node B, and 61-90 on node C.
In a centralized database, all 90 pieces of data are stored on a single node.
Now, if you want to query with a limit of 10,
In a centralized database, since all 90 pieces of data are stored on one node, you can directly fetch 10 pieces of data from the storage node and return them to the client.
In a distributed database, the data is distributed across 3 nodes, so you need to fetch 30 pieces of data from each node, then aggregate the 90 pieces of data on the TiDB node, sort them, and then take the top 10. If you fetch 10 pieces of data from each storage node, you will aggregate 30 pieces of data, but these 30 pieces may not include the top 10 sorted out of the 90 pieces.

| username: TiDBer_i3pbMJ65 | Original post link

My data is stored in TiKV in an ordered manner according to the primary key. Here, 10 consecutive pieces of data are queried, which should all be distributed on a single shard. This means it can be returned from a single TiKV without needing to query all TiKVs. Even if querying from all TiKV nodes and then applying the limit on the TiDB node, the latency shouldn’t differ by tens or hundreds of times, right?

| username: TiDBer_i3pbMJ65 | Original post link

The execution plan shows that the time difference between one being close to 1ms and the other being 1.5ms is not significant, but:

The last line in both scenarios respectively shows:
4 rows in set (0.00 sec)
4 rows in set (0.37 sec)

This indicates a significant latency difference between the two queries.

| username: h5n1 | Original post link

Is this situation consistently reproducible every time?

| username: TiDBer_i3pbMJ65 | Original post link

Yes, it can be stably reproduced!
I have an environment here where it can be stably reproduced. Is there an expert who can remotely take a look?

| username: h5n1 | Original post link

Trace select … Check the two SQLs.

| username: TiDBer_i3pbMJ65 | Original post link

The following execution took about five minutes, and the screen was filled with loadRegion prints.

| username: h5n1 | Original post link

In the screenshot of your post, the SQL does not have an ORDER BY clause.

The second SQL seems to need to fetch information about all regions when building the task.

| username: Kongdom | Original post link

:thinking: GUID as a primary key should be scattered in storage, right? That’s how I understand it.

| username: zhaokede | Original post link

The data range is different, limit only refers to the amount of data returned.

| username: TiDBer_i3pbMJ65 | Original post link

Whether there is an order by makes little difference,
both have order by

both do not have order by

| username: TiDBer_i3pbMJ65 | Original post link

It looks like the difference is in distsql.Select.

| username: TiDBer_jYQINSnf | Original post link

Isn’t it obvious? Without an end range, it needs to scan all regions from the starting position onward, and then aggregate at the TiDB layer. With an end range, it only needs to scan a few regions within the range.

| username: TiDBer_bcK9aJQQ | Original post link

He is using the primary key field for filtering.

| username: TiDBer_bcK9aJQQ | Original post link

The field he uses for filtering is the primary key, not other non-primary key fields. According to what you said, the query time will continuously increase as the data volume increases. How can this be optimized?