After analyzing the table, there are still cases where the statistics are inaccurate

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: analyze table 之后,统计信息还是会有不准的情况

| username: BraveChen

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 5.4.3
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact]
The most frequently used TiDB version is 5.*, and I remember encountering situations many times during SQL tuning where the execution plan was inaccurate even after the table statistics were collected.

Example: Today, I encountered a subquery with a full table of 140 million rows, needing 40 million rows. Due to inaccurate statistics (just manually collected), it chose to use index back to the table, which was slower. In this case, a full table scan would have been better. The issue was resolved after using ignore index.
image
image

Has anyone else encountered situations where the execution plan is still inaccurate after collecting statistics for some tables?
I searched but couldn’t find an issue related to this problem, though I remember it happening many times.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: buddyyuan | Original post link

Is the prepare_plan_cache enabled? Try turning it off and see if it helps.

| username: BraveChen | Original post link

I checked,
it’s not enabled.

| username: tidb菜鸟一只 | Original post link

SHOW stats_histograms WHERE table_name=‘t1’;
Let’s take a look at the statistics of the table.

| username: 我是咖啡哥 | Original post link

Has the histogram information been collected?

| username: BraveChen | Original post link

I directly used “analyze table t1”. In theory, it should collect everything.

| username: BraveChen | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: h5n1 | Original post link

plan replayer dump explain analyze select xxxx; Export the information

| username: BraveChen | Original post link

replayer_single_Sa3L2SH9CrYmcuscY9E3MQ==_1677725054367414336.zip|attachment (579.4 KB)

| username: h5n1 | Original post link

What version is your database? I’m using 6.1.1 to import.

| username: BraveChen | Original post link

5.4.3

| username: BraveChen | Original post link

The current version is 5.4.3, but I have also encountered this issue in other version 5 releases.

| username: tidb菜鸟一只 | Original post link

Could you share the table structure? How many indexes does this table have? On which fields are they?

| username: BraveChen | Original post link

I don’t quite understand, the table structure and statistics are unrelated, right?

| username: h5n1 | Original post link

show stats_buckets displays the histogram information of a column. If it doesn’t show up, use the column as a WHERE condition to execute an SQL query, and then check the range of your condition against the buckets.

| username: BraveChen | Original post link

There is no problem with the statistical information query.

| username: h5n1 | Original post link

It feels like it is directly estimated by row_count / distinct_count.

| username: tidb菜鸟一只 | Original post link

I don’t understand, why is there no do_id information in stats_histograms?

| username: BraveChen | Original post link

:joy: :joy: :joy: This is a bit ridiculous. You can’t just say this number is calculated this way. I suspect that the statistics information has been collected and the corresponding view information has been updated, but it is asynchronously loaded into the TiDB cache for updating. It is possible that due to certain parameter limitations, a small portion of the table’s statistics information cannot be updated.

| username: BraveChen | Original post link

Are you referring to the column name?