Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: analyze table 期间是否会对所操作的表加锁?
[TiDB Usage Environment] Production Environment
[TiDB Version] v5.0.4
I would like to know if the table will be locked during the analyze table operation? Since we have multiple tables with billions of rows and need to ensure business continuity, I would like to have a detailed understanding of whether these billion-row tables will be locked and how their usage will be affected during the analyze operation.
PS: I asked this question on chatGPT, but I’m not sure if GPT’s answer is accurate 
Thank you very much~
Without locking the table, you can refer to this post:
It won’t lock the table, but if you need to collect statistics on a very large table, it is recommended to manually collect them after modifying certain parameters. Otherwise, it is very easy to fail due to GC timeout.
After a GC timeout failure, will it be optimized and executed again? There was an instance where, despite the auto_analyze parameter being enabled, the database was particularly slow. Upon inspection, it was found that a table with 150 million rows had a health score of 0.
The handling of GC and analyze will not conflict, otherwise the business would be interrupted… (which would not be suitable for database application scenarios).
However, analyze does have an impact on performance, so it is best to start it when the business is not busy… This way, the success rate will also be higher.
Automatic collection of statistics failed due to GC timeout. It will attempt to collect automatically next time, but it is likely to fail again. Therefore, it is recommended to manually collect statistics for large tables.
It should not lock the table, but it will affect the cluster performance. Generally, it is set to automatically collect during off-peak business hours.
It doesn’t lock the table but will affect performance.
It should be an online operation. It won’t lock the table but will affect performance.
When collecting, pay attention to the GC. Otherwise, collecting statistics for large tables will not succeed.
Not locking the table consumes performance.
Hello, our current idea is:
- Add a system scheduled task to execute a script at 1 AM every day, checking all tables with a health score below 80 and performing analyze on them in sequence (equivalent to manually analyzing large tables periodically);
- Enable the tidb auto analyze parameter.
We are not sure if this solution is feasible.
Additionally, we have a question: in special circumstances, for example, if the system scheduled task script performs analyze on a large table at 1 AM, will tidb’s internal auto analyze also trigger analyze on this table? Will this situation affect the normal use of the system?
In fact, the second step is sufficient, as it supports setting the threshold for automatic analysis.
Previously, we configured automatic analyze, but it was reported that the system was getting slower and slower, eventually becoming unusable. Upon investigation, we found that a table with 150 million rows had a health score of 0.
It won’t work. If you only enable the automatic analysis feature, the large table will be affected by GC, leading to cyclic operations without any effect.
My current approach is similar to yours. I manually trigger analysis for large tables at midnight every day, so they won’t be affected by GC. Small tables are set to analyze automatically. So far, there haven’t been any issues. As for potential conflicts, you can separate the time periods for manual and automatic analysis.