Still can't figure out why automatic statistics collection always fails, but manual collection succeeds?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 还是没搞明白为啥自动收集统计信息老失败,手动收集就能成功?

| username: 我是咖啡哥

【TiDB Usage Environment】Production Environment
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Issue Phenomenon and Impact】
【Resource Configuration】
【Attachments: Screenshots/Logs/Monitoring】
SELECT * FROM INFORMATION_SCHEMA.ANALYZE_STATUS WHERE table_name=‘XXX’;

For the same table, which has over 200 million rows, the automatic collection never succeeds, reporting an error after 20 minutes.
Manual collection, however, completes in just 11 minutes. The job info content is the same.

auto analyze table all columns with 256 buckets, 500 topn, 0.0004305733879263856 samplerate
analyze table all columns with 256 buckets, 500 topn, 0.0004305733879263856 samplerate

What is the difference between automatic collection and manual collection? Why is there such a big difference?

| username: db_user | Original post link

It indeed affects 6.1.

| username: xingzhenxiang | Original post link

Comparing the parameters of manual and automatic concurrency, you can see the differences. The automatic settings are generally lower.

| username: 我是咖啡哥 | Original post link

I haven’t manually set it. Which parameter are you referring to?

| username: xingzhenxiang | Original post link

| tidb_auto_build_stats_concurrency | 1 |
| tidb_build_stats_concurrency | 8 |

| username: 我是咖啡哥 | Original post link

Why does my version only have | tidb_build_stats_concurrency | 4 | and not the auto one?

| username: 我是咖啡哥 | Original post link

Oh, I saw the documentation. It was added in version 6.5.

| username: Alex920 | Original post link

Did you encounter a bug?

| username: weixiaobing | Original post link

The automatically collected concurrency is set to 1, so the speed is relatively slow. It reports an error after 20 minutes. Check the GC time to see if it exceeds the GC limit.

| username: xingzhenxiang | Original post link

Alright, I just upgraded to this version.

| username: 我是咖啡哥 | Original post link

Where can I see that the automatically collected concurrency is 1? I just want to know where this difference lies.

| username: weixiaobing | Original post link

You can set all the parameters below to 1 and manually collect the data. The execution time should be similar to the automatic collection time.

Control ANALYZE Concurrency

When executing the ANALYZE statement, you can adjust the concurrency through some parameters to control the impact on the system.

tidb_build_stats_concurrency

Currently, when ANALYZE is executed, it is divided into small tasks, each responsible for a specific column or index. tidb_build_stats_concurrency controls the number of tasks executed simultaneously, with a default value of 4.

tidb_distsql_scan_concurrency

When analyzing ordinary column tasks, tidb_distsql_scan_concurrency can be used to control the number of Regions read at one time, with a default value of 15.

tidb_index_serial_scan_concurrency

When analyzing index column tasks, tidb_index_serial_scan_concurrency can be used to control the number of Regions read at one time, with a default value of 1.

| username: 我是咖啡哥 | Original post link

Are these parameter values all 1 when automatically collected? Which document mentions this?

| username: Running | Original post link

No similar issues encountered on 6.5.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.