[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.7
[Reproduction Path] Import data through DTS in full
[Encountered Problem: Problem Phenomenon and Impact] Executing show stats_healthy reveals that 70% of the tables (total number of tables is over 20,000) have a statistics value of 0, while only 30% of all tables have a value of 100. Has TiDB done any special handling for table statistics collection? Because this kind of problem has not been encountered in native MySQL.
I understand that the collection of statistics needs to be handled automatically by the product itself. From the official documentation, there isn’t much content about automatic handling, mainly discussing the tidb_analyze_version configuration, which is currently set to 2 by default. After comparing some configurations related to statistics collection with the official documentation, I didn’t find any clues. Is it possible that TiDB’s statistics collection tasks run in a single thread? Wouldn’t the collection speed be too slow in scenarios with tens of thousands of tables?
It seems there could be a serious pitfall here. If one day the statistics for online tables (in scenarios with a relatively large number of tables) become inaccurate, the execution plan could be completely messed up.
No worries, the TiDB version is still iterating, and the subsequent resource isolation capabilities are aimed at maximizing resource utilization scenarios, which will gradually improve.
Currently, if you encounter similar issues, you may need to handle them manually (or use hints, so even if the collected statistics are outdated, it won’t affect the SQL optimization and matching process).
The newly imported data needs to have all statistics collected. You can take a look at the Lightning import; it ultimately needs to analyze all tables.
Moreover, I also found an issue: the manually triggered analyze table operation is actually performed synchronously (with a sampling ratio of around 10%, so it takes a relatively long time), unlike native MySQL, which performs it asynchronously.