After setting the parameters to enable automatic analyze, it was found that automatic optimization was not triggered

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 设置开启自动analyze相关参数后,发现并没有触发自动优化。

| username: vcdog

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.0
[Reproduction Path] After setting the parameters to enable auto-analyze, it was found that automatic optimization was not triggered.
[Encountered Problem: Problem Phenomenon and Impact]
img_v3_02bo_7c8b1f14-fb72-453d-9f19-555bd93d1e1g

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

Take a look at this introduction: SQL 操作常见问题 | PingCAP 文档中心

Additionally, here are some recommended articles for you to read:

They are quite well-written, and I hope you find the answers you are looking for!

| username: vcdog | Original post link

Okay, I will first take a look at the two documents you recommended. Recently, this issue has been bothering me.

| username: mono | Original post link

There are trigger conditions. It hasn’t been triggered. That’s good. It means you can sleep peacefully every day for now! :grinning:

| username: 濱崎悟空 | Original post link

Still need to wait for the trigger conditions to be met.

| username: lemonade010 | Original post link

When a single partition of a table or partitioned table reaches 1000 records, and the ratio of (modifications/total current rows) in the table or partition exceeds the tidb_auto_analyze_ratio, the ANALYZE statement will be automatically triggered.

The default value of tidb_auto_analyze_ratio is 0.5, which means auto analyze is enabled by default. Note that it is not recommended to set this variable value greater than or equal to the pseudo-estimate-ratio (default value is 0.8), otherwise the optimizer may use pseudo statistics. Starting from TiDB v5.3.0, the tidb_enable_pseudo_for_outdated_stats variable was introduced. When set to OFF, pseudo statistics will not be used even if the statistics are outdated.

| username: vcdog | Original post link

However, by checking the health status, there are many tables with a health score below 50, indicating that the lower the health score, the higher the data changes in the corresponding table. Theoretically, when more than 50% of the rows in a table are modified, it triggers an automatic ANALYZE update.

### `tidb_auto_analyze_ratio`

* Scope: GLOBAL
* Persisted to cluster: Yes
* Default value: `0.5`
* This variable is used to set the threshold for TiDB to automatically execute [`ANALYZE TABLE`](https://docs.pingcap.com/zh/tidb/v6.5/sql-statement-analyze-table) in the background to update statistics. `0.5` means that when more than 50% of the rows in a table are modified, it triggers an automatic ANALYZE update. You can specify `tidb_auto_analyze_start_time` and `tidb_auto_analyze_end_time` to restrict the time for automatic ANALYZE.
| username: vcdog | Original post link

In the test environment database, I found a table with a data volume of over 12.4k. I cloned the table and inserted the data into the new table. Then, I performed a full table update operation on the new table. After observation, the automatic analyze operation was not triggered for the table.


MySQL [maindb]> show variables like '%analyze%';
+------------------------------+-------------+
| Variable_name                | Value       |
+------------------------------+-------------+
| tidb_analyze_version         | 2           |
| tidb_auto_analyze_end_time   | 23:59 +0000 |
| tidb_auto_analyze_ratio      | 0.5         |
| tidb_auto_analyze_start_time | 00:00 +0000 |
| tidb_enable_analyze_snapshot | OFF         |
| tidb_enable_auto_analyze     | ON          |
| tidb_enable_fast_analyze     | ON          |
| tidb_max_auto_analyze_time   | 43200       |
| tidb_mem_quota_analyze       | -1          |
| tidb_persist_analyze_options | ON          |
+------------------------------+-------------+
10 rows in set (0.00 sec)

MySQL > create table xxl_job_log_20240611_bk like xxl_job_log;
Query OK, 0 rows affected (0.51 sec)

MySQL > insert into xxl_job_log_20240611_bk select * from xxl_job_log;
Query OK, 12645 rows affected (0.54 sec)
Records: 12645  Duplicates: 0  Warnings: 0

MySQL > select count(*) from xxl_job_log_20240611_bk;
+----------+
| count(*) |
+----------+
|    12645 |
+----------+
1 row in set (0.01 sec)

MySQL > update xxl_job_log_20240611_bk t1 set t1.job_group=10086;
Query OK, 12645  rows affected (0.22 sec)
Rows matched: 12645  Changed: 12645  Warnings: 0

MySQL [maindb]> SHOW STATS_HEALTHY where  table_name='xxl_job_log_20240611_bk';
+---------+-------------------------+----------------+---------+
| Db_name | Table_name              | Partition_name | Healthy |
+---------+-------------------------+----------------+---------+
| test    | xxl_job_log_20240611_bk |                |      63 |
+---------+-------------------------+----------------+---------+
1 row in set (0.00 sec)
| username: xfworld | Original post link

The SHOW STATS_HEALTHY statement can estimate the accuracy of statistics, which is the healthiness. Tables with low healthiness may generate suboptimal query execution plans.

When the healthiness of a table drops below the tidb_auto_analyze_ratio, the ANALYZE statement will be automatically executed.

Refer here:

| username: vcdog | Original post link

  1. In TiDB, what is the trigger strategy for auto analyze?
    Reference link: SQL 操作常见问题 | PingCAP 文档中心
  2. When the health of the table drops below tidb_auto_analyze_ratio, the ANALYZE statement will be automatically executed.

Reference link:

Why do these two documents seem a bit contradictory? I re-adjusted and tested it again, but still couldn’t trigger the automatic analyze collection.

| username: vcdog | Original post link

  1. The current parameters for automatic analysis are as follows:

  2. The current health of the table:

  3. Batch full table data update:

  4. Check the health of the table again:


At this point, the health is 0, which is much less than tidb_auto_analyze_ratio=0.9, but automatic collection was not triggered.

| username: 胡杨树旁 | Original post link

Automatically set the start time and end time to Time+0800 and give it a try.

| username: xfworld | Original post link

Please check the other parameters:

Additionally, you can refer to:

show analyze status
You can check if any analyze operations have been executed…

| username: Kongdom | Original post link

Can manually running ANALYZE TABLE improve health?

Also, it shouldn’t be caused by this version, right?

| username: FutureDB | Original post link

Indeed, check if the task of collecting statistical information has failed.

| username: vcdog | Original post link

mysql> show analyze status where table_name ='xxl_job_log_20240611_bk';
+--------------+-------------------------+----------------+-------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+---------------+------------+-------------------+----------+----------------------+
| Table_schema | Table_name              | Partition_name | Job_info                                                                | Processed_rows | Start_time          | End_time            | State    | Fail_reason | Instance      | Process_ID | Remaining_seconds | Progress | Estimated_total_rows |
+--------------+-------------------------+----------------+-------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+---------------+------------+-------------------+----------+----------------------+
| bj_sjzt_db   | xxl_job_log_20240611_bk |                | auto analyze table all columns with 256 buckets, 500 topn, 1 samplerate |          12645 | 2024-06-11 17:12:46 | 2024-06-11 17:12:53 | finished | NULL        | 10.3.9.9:4000 |       NULL | NULL              |     NULL |                 NULL |
+--------------+-------------------------+----------------+-------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+---------------+------------+-------------------+----------+----------------------+
1 row in set (0.02 sec)
| username: vcdog | Original post link

By checking the analyze status, it was found that automatic collection was triggered. In terms of timing, it was not immediately triggered after the manual update.

| username: ziptoam | Original post link

Has the issue been resolved? This problem is quite complex.

| username: vcdog | Original post link

It can be considered resolved. There will be a delay in trigger accuracy, but it doesn’t affect automatic collection. Currently, we have configured automatic collection from midnight to 8 AM every day.

| username: vcdog | Original post link

Additionally, I want to give the R&D team a regular read-only account and grant them permission to view the health status. What permissions should I grant? I looked at the syntax tree, but it seems there isn’t one: