Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 执行计划问题
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]
Index situation: Both baseCodeId and version have single-column indexes, but the optimizer chooses the version index. After forcing the use of the index with /
+ USE_INDEX(codedetail0_
,idx_codedetail_basecodeid
)/, it was found that:
The baseCodeId index performs better. It’s unclear why the version index is chosen here.
The statistics are inaccurate.
Check if SPM is being used.
Where can I check if the statistics are accurate? If the statistics are inaccurate, should I just collect the statistics again?
I would like to ask, how can I check this SPM?
estrows actrows, you can check the health of the table.
After completing the table analysis, see if the execution plan is accurate.
If it is still not accurate, check if there is a bound SPM.
Just checked, didn’t use SPM.
Is it possible that the issue is caused by the order by clause? The sorting is done by version.
Health score is 74%. I should collect some statistics first. After collecting, I will check the execution plan.
Would creating a composite index with id+version+return fields be more effective?
Assuming the return fields are few, if there are many, would the first two be better?
Choosing the version
index seems to have operator pushdown, returning only one row to TiDB. However, for the baseCodeId
index, why does it first return 20 rows and then apply a limit of 1?
Collected statistics, but the execution plan still chooses the ‘version’ index.
Each node should return one result, and then TiDB sorts and selects one.
It is possible to create a composite index, but it feels strange. Why does the optimizer choose the index of version? Clearly, selecting baseCodeId would be better for reading.
Try removing the “order by version”.
After removing it, the (basecodeid) index was used.
What does this mean? I don’t quite understand.
Why does this place return a 20?
Because it is distributed, the top 1 from each node is retrieved, and then the top 1 from each node is processed again in TiDB to get the largest one.
So, can I understand it this way: I take the top 1 from each TiKV node, which totals 20 pieces of data. Here, the sum of the top 1 from each TiKV node represents the total. Then, these 20 pieces of data are returned to the TiDB server, and the TiDB server takes the top 1 again.
Yes, but it should be more accurate to say “region.”