Issue of Uneven Command Execution Load After Enabling tidb_enable_auto_analyze

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb_enable_auto_analyze开启后命令执行负载不均衡的问题

| username: TiDBer_cCxPj52F

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.1.2
[Reproduction Path]
tidb_enable_auto_analyze on

[Encountered Problem: Phenomenon and Impact]
Configured to automatically execute analyze. The analyze commands are all assigned to the same tidb-server instance, causing high load on that instance. Is there a corresponding strategy to balance such tasks across all tidb-server instances?

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]
The logs of a certain tidb instance show a large number of the following logs, almost once per minute, not found on other tidb-server nodes
[2023/10/30 16:27:13.435 +08:00] [WARN] [expensivequery.go:118] [expensive_query] [cost_time=720.847320279s] [conn=5289488757462925315] [txn_start_ts=0] [mem_max=“140378499 Bytes (133.9 MB)”] [sql=“analyze table XXX.XXX”]

| username: tidb菜鸟一只 | Original post link

This is single-threaded, and it will continue to execute to completion on whichever tidb-server it is running.

| username: TiDBer_cCxPj52F | Original post link

Is it assigned to the same node every time? Basically, there’s one every few minutes.

| username: 有猫万事足 | Original post link

Yes, like DDL, it is executed on a single node.
The distributed task framework has just been developed, and by version 7.4, it only supports add index and import tasks.

It supports analysis statements. It will probably take some more time.

| username: 有猫万事足 | Original post link

Uh, so the issue from this morning was found to be related to auto analyze through topsql?

If that’s the case, we might as well set a time limit for auto analyze, so that this task is completed during business idle times.

| username: TiDBer_cCxPj52F | Original post link

Yes, it was eventually traced back to this cause. The cluster has always been very busy, with one table updating 50% of its data every minute, so it constantly performs analyze operations, which ended up crashing one of the TiDB servers.

| username: tidb菜鸟一只 | Original post link

Limit the time and proportion of automatic statistics collection, or manually collect statistics for the corresponding table separately.

| username: Fly-bird | Original post link

The tidb-server is executed by one node each time, you can increase the configuration.

| username: zhanggame1 | Original post link

There is no good solution, you can only improve the hardware configuration.

| username: cassblanca | Original post link

You can control it with a few parameters to execute automatic statistics collection when the system is appropriately idle, minimizing the impact on system load.

| username: Kongdom | Original post link

We perform automatic analysis in the early morning, and it is not allowed during the day.

| username: h5n1 | Original post link

Lock the statistics. Even if there are many updates, if the data distribution characteristics and skewness do not change much, it is okay not to collect them.

| username: TiDBer_小阿飞 | Original post link

You can avoid performing analysis during peak business hours by setting it up to automatically analyze at night.

| username: Jolyne | Original post link

You can set periodic analyze, adjust the modification ratio of the table, and the analysis time and concurrency. The following parameters can be noted:
tidb_auto_analyze_ratio
tidb_auto_analyze_start_time
tidb_auto_analyze_end_time
tidb_build_stats_concurrency