TiDB version 5.4 did not generate the correct execution plan, resulting in prolonged SQL execution time

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB5.4版本没有生成正确的执行计划,导致sql执行时间过长

| username: 朱振文啊

To improve efficiency, please provide the following information. Clear problem descriptions can lead to faster resolutions:
[TiDB Usage Environment]
Production Environment

[Overview] Scenario + Problem Overview
SQL query generates an inaccurate execution plan, leading to long SQL execution times.
[Background] Actions Taken
Synchronized production data to the debugging cluster, checked the SQL execution plan, and found that the SQL could correctly generate an accurate execution plan, achieving the expected query speed.
[Phenomenon] Business and Database Phenomenon
Business queries are too slow.
[Problem] Current Issues Encountered
1: Two clusters with the same data content and version generate different execution plans.
[Business Impact]
1: Slow queries.
[TiDB Version]
v5.4


[Application Software and Version]

[Attachments] Relevant Logs and Configuration Information

  • TiUP Cluster Display Information
  • TiUP Cluster Edit Config Information

Monitoring (https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana Monitoring
  • TiDB Grafana Monitoring
  • TiKV Grafana Monitoring
  • PD Grafana Monitoring
  • Corresponding Module Logs (including logs 1 hour before and after the issue)

If the question is related to performance optimization or fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload. tableSql (10.7 KB)

| username: tidb狂热爱好者 | Original post link

select from tablea use index (idx_index) where a=999 force index

| username: OnTheRoad | Original post link

Are the indexes consistent in both environments?

| username: 朱振文啊 | Original post link

Consistently, the database in the debugging cluster is synchronized with the production environment database by DM.

| username: forever | Original post link

I feel that the statistical information is still not very accurate. By the way, please also post the index parts of the two environments.

| username: 朱振文啊 | Original post link

This approach is not being considered for now, as it may lead to other issues later.

| username: tidb狂热爱好者 | Original post link

The indexes of these two tables are obviously different, and the wrong index was used. Sigh, I have nothing more to say. Six composite indexes.

| username: wakaka | Original post link

Please provide the table structure and approximate data volume.

| username: alfred | Original post link

We can only look for reasons in the accuracy of statistical information and differences in parameter configuration.

| username: 朱振文啊 | Original post link

I have just uploaded the table structures of two tables. By comparing the files, the table structures in the two clusters are consistent.

| username: 朱振文啊 | Original post link

I have just uploaded the table structures of two tables. Both tables have around ten million records. By comparing the files, the table structures in the two clusters are consistent.

| username: 朱振文啊 | Original post link

Well, currently we can only investigate in this direction.

| username: wakaka | Original post link

Use force_index to bind the execution plan of this statement, which has the least impact.

| username: 朱振文啊 | Original post link

I found the reason. In the production environment cluster, the default value of tidb_analyze_version was changed from 2 to 1 by the operations team. The solution is to change the version back to 2, clear the statistics, and re-analyze the table. Finally, the SQL query will use the correct index.

| username: alfred | Original post link

Statistics in Version 2 avoid the significant errors that could occur in large datasets due to hash collisions in Version 1, while maintaining estimation accuracy in most scenarios.