Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiDB如何准确评估每个SQL的最优执行计划?
To improve efficiency, please provide the following information. Clear problem descriptions can be resolved faster:
【TiDB Usage Environment】
【Overview】 Scenario + Problem Overview
【Background】 Operations performed
【Phenomenon】 Business and database phenomena
【Problem】 Current issues encountered
【Business Impact】
【TiDB Version】
【Application Software and Version】
【Attachments】 Relevant logs and configuration information
- TiUP Cluster Display information
- TiUP Cluster Edit config information
Monitoring (https://metricstool.pingcap.com/)
- TiDB-Overview Grafana monitoring
- TiDB Grafana monitoring
- TiKV Grafana monitoring
- PD Grafana monitoring
- Corresponding module logs (including logs 1 hour before and after the issue)
For questions related to performance optimization and troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.
TiDB will automatically or manually collect table statistics. The TiDB optimizer will automatically choose the optimal execution plan based on cost estimation. If an execution plan is bound, it will execute according to the bound execution plan.
Based on cost evaluation.
The optimizer first determines if the SQL can be executed as a Point_Get. If it is a query on a primary key or unique index, there’s nothing to evaluate; it’s already optimal.
Then, logical optimization is performed, rewriting the SQL equivalently to eliminate some redundant logic, such as converting outer joins to inner joins, etc.
Finally, based on the table’s statistics, the optimizer selects the execution plan with the lowest cost. The cost here is calculated based on the potential CPU, I/O, and other resources consumed during SQL execution, combined with table statistics and index conditions. For example, if an SQL involves an INNER JOIN among tables a, b, and c, there are 32=6 possible sequences for the table associations. There are 4 possible ways to perform the JOIN between tables (HashJoin/MergeJoin/IndexJoin/IndexHashJoin), so the total number of possible JOIN methods for the three tables is 64*2=48 (each association sequence has 2 JOIN points, and each JOIN point has 4 methods to choose from). The optimizer needs to select the execution plan with the lowest cost from at least these 48 possible plans, based on the size of each table, the associated fields, the indexes on other fields, the filtering capability of the indexes, and the resource consumption during execution. This process also highlights the critical importance of the accuracy of statistical information in the optimizer’s evaluation of the optimal execution plan.
What auxiliary improvement methods does TiDB have for suboptimal execution plans caused by the optimizer’s own defects, given that input information such as object or system statistics, compilation environment, and bind variables can still be manually intervened to improve?
Intervention can be done through SPM
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.