Execution Plan Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 执行计划问题

| username: 胡杨树旁

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]


Index situation: Both baseCodeId and version have single-column indexes, but the optimizer chooses the version index. After forcing the use of the index with /+ USE_INDEX(codedetail0_,idx_codedetail_basecodeid)/, it was found that:

The baseCodeId index performs better. It’s unclear why the version index is chosen here.

| username: 裤衩儿飞上天 | Original post link

The statistics are inaccurate.

| username: 近墨者zyl | Original post link

Check if SPM is being used.

| username: 胡杨树旁 | Original post link

Where can I check if the statistics are accurate? If the statistics are inaccurate, should I just collect the statistics again?

| username: 胡杨树旁 | Original post link

I would like to ask, how can I check this SPM?

| username: 裤衩儿飞上天 | Original post link

estrows actrows, you can check the health of the table.
After completing the table analysis, see if the execution plan is accurate.
If it is still not accurate, check if there is a bound SPM.

| username: 胡杨树旁 | Original post link

Just checked, didn’t use SPM.

| username: Kongdom | Original post link

Is it possible that the issue is caused by the order by clause? The sorting is done by version.

| username: 胡杨树旁 | Original post link

Health score is 74%. I should collect some statistics first. After collecting, I will check the execution plan.

| username: Jiawei | Original post link

Would creating a composite index with id+version+return fields be more effective?
Assuming the return fields are few, if there are many, would the first two be better?

| username: 胡杨树旁 | Original post link

Choosing the version index seems to have operator pushdown, returning only one row to TiDB. However, for the baseCodeId index, why does it first return 20 rows and then apply a limit of 1?

| username: 胡杨树旁 | Original post link

Collected statistics, but the execution plan still chooses the ‘version’ index.

| username: Kongdom | Original post link

Each node should return one result, and then TiDB sorts and selects one.

| username: 胡杨树旁 | Original post link

It is possible to create a composite index, but it feels strange. Why does the optimizer choose the index of version? Clearly, selecting baseCodeId would be better for reading.

| username: Kongdom | Original post link

Try removing the “order by version”.

| username: 胡杨树旁 | Original post link

After removing it, the (basecodeid) index was used.

| username: 胡杨树旁 | Original post link

What does this mean? I don’t quite understand.


Why does this place return a 20?

| username: Kongdom | Original post link

Because it is distributed, the top 1 from each node is retrieved, and then the top 1 from each node is processed again in TiDB to get the largest one.

| username: 胡杨树旁 | Original post link

So, can I understand it this way: I take the top 1 from each TiKV node, which totals 20 pieces of data. Here, the sum of the top 1 from each TiKV node represents the total. Then, these 20 pieces of data are returned to the TiDB server, and the TiDB server takes the top 1 again.

| username: Kongdom | Original post link

Yes, but it should be more accurate to say “region.”