Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb 收集表统计信息是全量收集嘛
May I ask, teachers, I believe that when TiDB automatically collects table statistics in the background (automatically executes the analyze table statement) and when manually executing the analyze table statement, the default behavior is to collect the table’s statistics in full, right?
Both collect the full amount. However, the collection speed is different. Manual execution allows setting the degree of parallelism, while automatic collection does not allow setting it.
Default full.
Reference: 统计信息简介 | PingCAP 文档中心
Executing the ANALYZE TABLE
statement in TiDB takes longer than in MySQL or InnoDB. InnoDB samples only a small number of pages, but TiDB completely reconstructs a series of statistics. Scripts suitable for MySQL mistakenly assume that executing ANALYZE TABLE
takes a shorter time.
For faster analysis, you can set tidb_enable_fast_analyze
to 1
to enable the fast analysis feature. The default value of this parameter is 0
.
After enabling the fast analysis feature, TiDB will randomly sample about 10,000 rows of data to build statistics. Therefore, in cases of uneven data distribution or small data volumes, the accuracy of the statistics will be relatively poor. This may lead to suboptimal execution plans, such as choosing the wrong index. If you can tolerate the execution time of the regular ANALYZE
statement, it is recommended to disable the fast analysis feature.
tidb_enable_fast_analyze
is an experimental feature and currently does not fully match the statistics of tidb_analyze_version=2
. Therefore, when enabling tidb_enable_fast_analyze
, you need to set the value of tidb_analyze_version
to 1
.
Thank you for your reply, expert.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.