Execution Plan Issues

translator_bot · June 23, 2024, 12:09am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 执行计划问题

| username: 胡杨树旁

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]

Index situation: Both baseCodeId and version have single-column indexes, but the optimizer chooses the version index. After forcing the use of the index with /+ USE_INDEX(codedetail0_,idx_codedetail_basecodeid)/, it was found that:

The baseCodeId index performs better. It’s unclear why the version index is chosen here.

translator_bot · June 23, 2024, 12:09am

| username: 裤衩儿飞上天 | Original post link

The statistics are inaccurate.

translator_bot · June 23, 2024, 12:09am

| username: 近墨者zyl | Original post link

Check if SPM is being used.

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

Where can I check if the statistics are accurate? If the statistics are inaccurate, should I just collect the statistics again?

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

I would like to ask, how can I check this SPM?

translator_bot · June 23, 2024, 12:09am

| username: 裤衩儿飞上天 | Original post link

estrows actrows, you can check the health of the table.
After completing the table analysis, see if the execution plan is accurate.
If it is still not accurate, check if there is a bound SPM.

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

Just checked, didn’t use SPM.

translator_bot · June 23, 2024, 12:09am

| username: Kongdom | Original post link

Is it possible that the issue is caused by the order by clause? The sorting is done by version.

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

Health score is 74%. I should collect some statistics first. After collecting, I will check the execution plan.

translator_bot · June 23, 2024, 12:09am

| username: Jiawei | Original post link

Would creating a composite index with id+version+return fields be more effective?
Assuming the return fields are few, if there are many, would the first two be better?

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

Choosing the version index seems to have operator pushdown, returning only one row to TiDB. However, for the baseCodeId index, why does it first return 20 rows and then apply a limit of 1?

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

Collected statistics, but the execution plan still chooses the ‘version’ index.

translator_bot · June 23, 2024, 12:09am

| username: Kongdom | Original post link

Each node should return one result, and then TiDB sorts and selects one.

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

It is possible to create a composite index, but it feels strange. Why does the optimizer choose the index of version? Clearly, selecting baseCodeId would be better for reading.

translator_bot · June 23, 2024, 12:09am

| username: Kongdom | Original post link

Try removing the “order by version”.

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

After removing it, the (basecodeid) index was used.

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

What does this mean? I don’t quite understand.

Why does this place return a 20?

translator_bot · June 23, 2024, 12:09am

| username: Kongdom | Original post link

Because it is distributed, the top 1 from each node is retrieved, and then the top 1 from each node is processed again in TiDB to get the largest one.

translator_bot · June 23, 2024, 12:09am

| username: 胡杨树旁 | Original post link

So, can I understand it this way: I take the top 1 from each TiKV node, which totals 20 pieces of data. Here, the sum of the top 1 from each TiKV node represents the total. Then, these 20 pieces of data are returned to the TiDB server, and the TiDB server takes the top 1 again.

translator_bot · June 23, 2024, 12:09am

| username: Kongdom | Original post link

Yes, but it should be more accurate to say “region.”