TiDB version 5.4 did not generate the correct execution plan, resulting in prolonged SQL execution time

translator_bot · June 23, 2024, 4:45am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB5.4版本没有生成正确的执行计划，导致sql执行时间过长

| username: 朱振文啊

To improve efficiency, please provide the following information. Clear problem descriptions can lead to faster resolutions:
[TiDB Usage Environment]
Production Environment

[Overview] Scenario + Problem Overview
SQL query generates an inaccurate execution plan, leading to long SQL execution times.
[Background] Actions Taken
Synchronized production data to the debugging cluster, checked the SQL execution plan, and found that the SQL could correctly generate an accurate execution plan, achieving the expected query speed.
[Phenomenon] Business and Database Phenomenon
Business queries are too slow.
[Problem] Current Issues Encountered
1: Two clusters with the same data content and version generate different execution plans.
[Business Impact]
1: Slow queries.
[TiDB Version]
v5.4

[Application Software and Version]

[Attachments] Relevant Logs and Configuration Information

TiUP Cluster Display Information
TiUP Cluster Edit Config Information

Monitoring (https://metricstool.pingcap.com/)

TiDB-Overview Grafana Monitoring
TiDB Grafana Monitoring
TiKV Grafana Monitoring
PD Grafana Monitoring
Corresponding Module Logs (including logs 1 hour before and after the issue)

If the question is related to performance optimization or fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload. tableSql (10.7 KB)

translator_bot · June 23, 2024, 4:45am

| username: tidb狂热爱好者 | Original post link

select from tablea use index (idx_index) where a=999 force index

translator_bot · June 23, 2024, 4:45am

| username: OnTheRoad | Original post link

Are the indexes consistent in both environments?

translator_bot · June 23, 2024, 4:45am

| username: 朱振文啊 | Original post link

Consistently, the database in the debugging cluster is synchronized with the production environment database by DM.

translator_bot · June 23, 2024, 4:45am

| username: forever | Original post link

I feel that the statistical information is still not very accurate. By the way, please also post the index parts of the two environments.

translator_bot · June 23, 2024, 4:45am

| username: 朱振文啊 | Original post link

This approach is not being considered for now, as it may lead to other issues later.

translator_bot · June 23, 2024, 4:45am

| username: tidb狂热爱好者 | Original post link

The indexes of these two tables are obviously different, and the wrong index was used. Sigh, I have nothing more to say. Six composite indexes.

translator_bot · June 23, 2024, 4:45am

| username: wakaka | Original post link

Please provide the table structure and approximate data volume.

translator_bot · June 23, 2024, 4:45am

| username: alfred | Original post link

We can only look for reasons in the accuracy of statistical information and differences in parameter configuration.

translator_bot · June 23, 2024, 4:45am

| username: 朱振文啊 | Original post link

I have just uploaded the table structures of two tables. By comparing the files, the table structures in the two clusters are consistent.

translator_bot · June 23, 2024, 4:45am

| username: 朱振文啊 | Original post link

I have just uploaded the table structures of two tables. Both tables have around ten million records. By comparing the files, the table structures in the two clusters are consistent.

translator_bot · June 23, 2024, 4:46am

| username: 朱振文啊 | Original post link

Well, currently we can only investigate in this direction.

translator_bot · June 23, 2024, 4:46am

| username: wakaka | Original post link

Use force_index to bind the execution plan of this statement, which has the least impact.

translator_bot · June 23, 2024, 4:46am

| username: 朱振文啊 | Original post link

I found the reason. In the production environment cluster, the default value of tidb_analyze_version was changed from 2 to 1 by the operations team. The solution is to change the version back to 2, clear the statistics, and re-analyze the table. Finally, the SQL query will use the correct index.

translator_bot · June 23, 2024, 4:46am

| username: alfred | Original post link

Statistics in Version 2 avoid the significant errors that could occur in large datasets due to hash collisions in Version 1, while maintaining estimation accuracy in most scenarios.