What is the frequency of internal analyze table?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 内部analyze table的频率是?

| username: xingzhenxiang

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v6.5.1
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]


Why is it taking so long for me?

| username: Jellybean | Original post link

Question 1: The frequency of Analyze table is related to the cluster configuration and the proportion of updated rows in the table. You can check the official documentation for details.

Question 2: Analyze table will re-read all data rows at the default ratio, re-analyze and calculate, and then update the statistics. This process usually consumes a lot of resources and time. The time consumption varies depending on the size of the table and the available resources of the cluster at that time.

| username: TiDBer_pkQ5q1l0 | Original post link

Sampling frequency depends on these parameters.

| username: tidb菜鸟一只 | Original post link

If your table is large enough and there are enough insert, delete, and update operations, theoretically, if you enable automatic analysis for 24 hours, it will continuously collect data automatically around the clock.

| username: xingzhenxiang | Original post link

There is basically no write, the table is very large.

| username: xingzhenxiang | Original post link

Can it be changed to analyze the table once a week?

| username: TiDBer_pkQ5q1l0 | Original post link

I haven’t changed it every week, but I can control the starttime and endtime. I usually do analyze between 0:00 and 6:00, and I have also set the tidb_auto_analyze_ratio relatively low.

| username: dba-kit | Original post link

Did the analyze fail during the import and never succeed afterward? You can try to manually analyze it. By default, automatic analyze will only start if more than 50% of the total rows have changed.

| username: db_user | Original post link

show analyze status; Check the specific situation.

| username: xingzhenxiang | Original post link

Let me see, these are the related results, there are quite a few, right?

| username: tidb菜鸟一只 | Original post link

The memb_tabb table is quite large, and the automatic collection of statistics takes too long and gets interrupted. It is recommended to manually collect the statistics. You can refer to the database’s busy level and choose a less busy time period, and increase the concurrency for collection. More details can be found here: 常规统计信息 | PingCAP 文档中心
Additionally, you don’t have to collect statistics for the entire table. You can set an appropriate sampling rate for collection using WITH FLOAT_NUM SAMPLERATE to specify the sampling rate. More details can be found here: 常规统计信息 | PingCAP 文档中心

| username: xingzhenxiang | Original post link

Please help me take a look, I have a lot of failures in this status.

| username: dba-kit | Original post link

I see many interruptions. Please share the current cluster’s analyze-related parameters, something like this:

| username: xingzhenxiang | Original post link

This is the relevant information. I set this to 0, and then there is no time limit? Right?

| username: xingzhenxiang | Original post link

I have already set it to 0.

| username: db_user | Original post link

The automatic analyze has a maximum time set, and it seems that large tables are timing out. You can first control the analyze to a very short time, setting the start_time and end_time within a few hours. Then, manually analyze the failed tables one by one using the command line with analyze table table_name. Make sure to execute this during off-peak business hours, or monitor the execution to avoid impacting business operations.

| username: xingzhenxiang | Original post link

My table is not updated frequently, only once a month.

set global tidb_max_auto_analyze_time = 0;

Is this sufficient?

| username: db_user | Original post link

You can first try manually analyzing the time it takes. In lower versions, manual analysis is generally much faster than automatic. If this large table is not frequently queried, it doesn’t matter much. However, TiDB seems to have no way to set whether to analyze based on the table level, specify the time point, and the trigger change ratio.

| username: xingzhenxiang | Original post link

Mine is currently performing an automatic analyze, but it restarts after timing out. I still need to learn the manual method. Let’s see how it goes without time restrictions this time. Manual analyze can be concurrent, so it’s faster, right?

| username: db_user | Original post link

The logic of manual analyze seems to be different. I haven’t looked into the principles in detail, but it’s better to execute analyze manually on a single table. Sometimes automatic analyze triggers some bugs, which are more noticeable in large tables, such as GC-related timeouts or inability to kill processes. You can search the forum for related posts; there are many discussions on this topic.