How does TiDB accurately evaluate the optimal execution plan for each SQL?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB如何准确评估每个SQL的最优执行计划?

| username: alfred

To improve efficiency, please provide the following information. Clear problem descriptions can be resolved faster:
【TiDB Usage Environment】

【Overview】 Scenario + Problem Overview

【Background】 Operations performed

【Phenomenon】 Business and database phenomena

【Problem】 Current issues encountered

【Business Impact】

【TiDB Version】

【Application Software and Version】

【Attachments】 Relevant logs and configuration information

  • TiUP Cluster Display information
  • TiUP Cluster Edit config information

Monitoring (https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana monitoring
  • TiDB Grafana monitoring
  • TiKV Grafana monitoring
  • PD Grafana monitoring
  • Corresponding module logs (including logs 1 hour before and after the issue)

For questions related to performance optimization and troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.

| username: forever | Original post link

TiDB will automatically or manually collect table statistics. The TiDB optimizer will automatically choose the optimal execution plan based on cost estimation. If an execution plan is bound, it will execute according to the bound execution plan.

| username: ddhe9527 | Original post link

Based on cost evaluation.
The optimizer first determines if the SQL can be executed as a Point_Get. If it is a query on a primary key or unique index, there’s nothing to evaluate; it’s already optimal.
Then, logical optimization is performed, rewriting the SQL equivalently to eliminate some redundant logic, such as converting outer joins to inner joins, etc.
Finally, based on the table’s statistics, the optimizer selects the execution plan with the lowest cost. The cost here is calculated based on the potential CPU, I/O, and other resources consumed during SQL execution, combined with table statistics and index conditions. For example, if an SQL involves an INNER JOIN among tables a, b, and c, there are 32=6 possible sequences for the table associations. There are 4 possible ways to perform the JOIN between tables (HashJoin/MergeJoin/IndexJoin/IndexHashJoin), so the total number of possible JOIN methods for the three tables is 64*2=48 (each association sequence has 2 JOIN points, and each JOIN point has 4 methods to choose from). The optimizer needs to select the execution plan with the lowest cost from at least these 48 possible plans, based on the size of each table, the associated fields, the indexes on other fields, the filtering capability of the indexes, and the resource consumption during execution. This process also highlights the critical importance of the accuracy of statistical information in the optimizer’s evaluation of the optimal execution plan.

| username: alfred | Original post link

What auxiliary improvement methods does TiDB have for suboptimal execution plans caused by the optimizer’s own defects, given that input information such as object or system statistics, compilation environment, and bind variables can still be manually intervened to improve?

| username: ddhe9527 | Original post link

Intervention can be done through SPM

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.