TiDB Slow SQL Optimization Issue: 100 Million Records Using TiFlash

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb慢SQL优化问题,1亿数据,走了tiflash

| username: hacker_77powerful

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.2
[Reproduction Path] Slow SQL
[Encountered Problem: Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

Using TiFlash, the slow SQL encountered is as follows:

  1 = 1
  89207843, 100;
| username: hacker_77powerful | Original post link

To add, the above SQL execution takes 22 minutes.

| username: 小龙虾爱大龙虾 | Original post link

What is the business logic of this SQL? In the scenario of deep pagination for a single table, can it be optimized from a business perspective?

| username: Jellybean | Original post link

The information currently provided is a bit limited, making it difficult to analyze the cause. Please provide the table structure and execution plan for this table.

It is also recommended to check the Dashboard page to see where this SQL is slow, as it can give a very intuitive view of the analysis process.

| username: TIDB-Learner | Original post link

If you know exactly which engine and index to use, specify them manually.

| username: dba远航 | Original post link

It’s best to post the execution plan.

| username: TiDBer_JUi6UvZm | Original post link

Add slow query logs, query plans, and index information.

| username: TiDBer_QYr0vohO | Original post link

Please share the execution plan.

| username: hacker_77powerful | Original post link

This issue has been optimized by directly rewriting the SQL. MySQL or TiDB uses LIMIT OFFSET and LIMIT, but only for small tables. If the table has over a billion rows, the execution plan will be very poor. I haven’t collected specific data.

OFFSET and LIMIT are fine for projects with small amounts of data. However, when the amount of data in the database exceeds the server’s memory capacity and all data needs to be paginated, problems will arise.

To achieve pagination, the database needs to perform an inefficient full table scan each time it receives a pagination request.

What is a full table scan? A full table scan (also known as a sequential scan) involves scanning each row in the database sequentially, reading each row in the table, and then checking whether each column meets the query conditions.

This type of scan is known to be the slowest because it requires a lot of disk I/O, and the overhead of transferring data from disk to memory is also significant.

This means that if you have 100 million users and the OFFSET is 50 million, it needs to fetch all those records (including many that are not needed), load them into memory, and then retrieve the 20 results specified by LIMIT.

Optimization method:
Rewrite the SQL to use id > 89207843 LIMIT 100.

| username: QH琉璃 | Original post link

You’re awesome!

| username: terry0219 | Original post link


| username: TiDBer_JUi6UvZm | Original post link

:+1: After the modification, what is the runtime?

| username: TiDBer_JUi6UvZm | Original post link

Does the id field have an index?

| username: xiaoqiao | Original post link


| username: oceanzhang | Original post link

The efficiency of skip limit is inherently low.

| username: TiDBer_vJGTQABF | Original post link

Create the table as a clustered table and try using TiKV.

| username: shigp_TIDBER | Original post link

Single table query, no where condition, order by.
Specific table data volume? Is id an index?

| username: hacker_77powerful | Original post link

Unified reply: After optimization, the execution time is 0.01 seconds, and the id is the primary key index.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.