Optimization Suggestions for Slow Return of Large Result Sets in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 返回大结果集慢的问题优化建议

| username: TiDBer_BraiRIcV

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4

Dear friends, when using TiDB to run large SQL queries online, compared to the MySQL we used before, I found:

  1. For queries that return small result sets, such as aggregation and pagination queries, TiDB is faster than MySQL.
  2. For queries that return large result sets, such as exporting data at the million-row level, TiDB is slower than MySQL.

For the second situation, do you have any optimization suggestions (at the TiDB level, business level, or other)? Thank you!!!

| username: h5n1 | Original post link

We need to look at the specific SQL, execution plan, table structure, etc.

| username: TiDBer_BraiRIcV | Original post link

Many of them are export-type SQL, large queries. So from the perspective of SQL optimization, it doesn’t make much sense. The main focus is on other ideas and methods besides optimizing SQL.

| username: h5n1 | Original post link

Simple single-table query? You need to look at the SQL first to know how to consider it.

| username: buddyyuan | Original post link

Export data? You can refer to the design of Dumpling and use _tidb_rowid for sharded export.

| username: Jiawei | Original post link

I think we should first avoid such large SQL queries and split them into smaller batches. Large transactions can be resolved by converting them into smaller transactions.

| username: 我是咖啡哥 | Original post link

This is especially noticeable when it involves sorting. For a single MySQL instance, with indexes, you can directly utilize the sorting feature of the index and continuously pull the data. However, in TiDB, since there are multiple data nodes, if the filtering conditions are not good, the data has to be pulled from the TiKV nodes to the TiDB server node for sorting, which can be slower and may even lead to OOM (Out of Memory).

| username: Kongdom | Original post link

Putting aside SQL when talking about optimization is just being unreasonable :smile:
And professional components should do professional tasks. Since the official high-speed export component is provided, why not use it?

| username: tidb菜鸟一只 | Original post link

At least post the execution plan so we can see it, otherwise how can we know where the problem is…

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.