Why does auto analyze table always fail for large tables, and how to resolve it?

translator_bot · June 22, 2024, 10:22am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 大表auto analyzetable总是失败，应该是什么原因，怎么解？

| username: 扬仔_tidb

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.3.3
[Reproduction Path] Always reproducible
[Encountered Problem: Phenomenon and Impact]
Server configuration changed to tidb_enable_fast_analyze=ON today
TiDB cluster server configuration: 16C/32G, found several large tables with over a billion rows showing auto analyze table failed in select * from ANALYZE_STATUS
SHOW STATS_META:
modify_count shows about 200 million, row_count about 96 million
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

translator_bot · June 22, 2024, 10:22am

| username: tidb菜鸟一只 | Original post link

After enabling the tidb_enable_fast_analyze fast analysis feature, TiDB will randomly sample about 10,000 rows of data to construct statistics. This method is strongly not recommended for large tables in production. Additionally, this parameter needs to be used in conjunction with tidb_analyze_version=1. What is the value set for this parameter in your system?
set global tidb_build_stats_concurrency=16;
set global tidb_distsql_scan_concurrency=64;
It is still recommended to modify the above two parameters to appropriate values during system idle periods. For large tables, it is better to manually collect statistics using WITH FLOAT_NUM SAMPLERATE to sample a percentage of the data.

translator_bot · June 22, 2024, 10:22am

| username: 扬仔_tidb | Original post link

The concurrency level cannot be set. I would like to ask how to set the sampling rate for auto analyze on large tables.

translator_bot · June 22, 2024, 10:22am

| username: 扬仔_tidb | Original post link

The concurrency values have already been set to 16 and 64. Executing analyze table xxxx with 0.1 samplerate; with this sampling rate causes the TiDB server to directly OOM and restart.

translator_bot · June 22, 2024, 10:22am

| username: tidb菜鸟一只 | Original post link

Starting from v6.1.0, TiDB introduced a memory limit for statistics collection. You can control the maximum total memory usage when TiDB updates statistics through the tidb_mem_quota_analyze variable.
However, since you are using version 5.3, I suggest you try changing the sampling rate to 0.01, which is 1%… Your table is indeed quite large.

translator_bot · June 22, 2024, 10:22am

| username: TiDBer_Terry261 | Original post link

Is the execution time too long, exceeding 12 hours? If so, you can increase the timeout period.

translator_bot · June 22, 2024, 10:22am

| username: 扬仔_tidb | Original post link

There was no timeout; it might be due to the large number of tables exceeding the GC time. I’m now trying to set the sampling rate to 0.01. However, there’s a problem: although version 6 has memory limits to protect the server, analyze still fails for large tables. How do you guys solve this? Manually lowering the sampling rate?

translator_bot · June 22, 2024, 10:22am

| username: weixiaobing | Original post link

You can check if the OOM is caused by this parameter.

translator_bot · June 22, 2024, 10:22am

| username: WalterWj | Original post link

Collect manually.

translator_bot · June 22, 2024, 10:22am

| username: xingzhenxiang | Original post link

Based on my manual execution experience, the speed is indeed faster. My commands are as follows, and I hope they are helpful to you:

Stop automatic collection
set global tidb_auto_analyze_end_time =‘01:00 +0000’; Effective immediately
set global tidb_max_auto_analyze_time =600; Effective immediately, any process exceeding this time before setting will also be killed;

Set concurrency parameters
set global tidb_build_stats_concurrency=8; This variable is used to set the concurrency level when executing the ANALYZE statement.

Execute manual collection
analyze table Table_schema.Table_name Manual speed is much faster than automatic, in 15 minutes it has collected more data than the automatic process did in 3 hours

translator_bot · June 22, 2024, 10:22am

| username: 扬仔_tidb | Original post link

There are several large tables in production, and manually executing a 0.1 ratio can easily cause the tidb-server to crash due to OOM. The ratio can only be reduced to around 0.01.

translator_bot · June 22, 2024, 10:22am

| username: 扬仔_tidb | Original post link

I’m envious that you all have version 6. We recently installed a new version, and the other machines are on 5.2 and 5.3. Can they be directly upgraded to 6.5?

translator_bot · June 22, 2024, 10:22am

| username: xingzhenxiang | Original post link

Sure, I upgraded directly from v4.0.11 to v6.5.1. Here is the link to the article:

translator_bot · June 22, 2024, 10:22am

| username: 扬仔_tidb | Original post link

Thank you so much.

translator_bot · June 22, 2024, 10:22am

| username: Jellybean | Original post link

It’s okay to set a lower sampling rate for large tables, you can try one in a thousand or one in ten thousand. The main thing is to ensure that the analyze operation does not affect the current cluster system. After analyzing, just check that the statistics have been updated.

translator_bot · June 22, 2024, 10:22am

| username: 胡杨树旁 | Original post link

It is recommended to manually collect the script for large tables.

translator_bot · June 22, 2024, 10:22am

| username: Alex920 | Original post link

The table is too large.

translator_bot · June 22, 2024, 10:22am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.