Scheduling Logic of auto_analyze

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: auto_analyze 调度逻辑

| username: TiDBer_rHlpKEY6

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v6.5.
[Issue]
When tidb_auto_analyze is enabled in TiDB, the analyze tasks are always scheduled to one server (this server has pd+tikv+tidb).
Question 1: What is the scheduling logic for auto_analyze?
Question 2: Can the running server be specified (in the case of auto_analyze)?

| username: Billmay表妹 | Original post link

Question 1: TiDB’s automatic statistics collection (auto analyze) is performed by a coroutine inside the TiDB Server. This coroutine periodically scans all tables to determine if statistics collection is needed. If statistics collection is required, a new coroutine is started on the current TiDB Server to execute the statistics collection task. Therefore, the scheduling logic of auto analyze is completed within the TiDB Server and does not involve scheduling by TiKV or PD.

Question 2: In the case of auto analyze, it is not possible to specify the running server. The TiDB Server will decide whether to execute the statistics collection task and on which TiDB Server to execute the task based on its own load. If you need to specify the running server, you can use manual statistics collection (manual analyze). Manual statistics collection can be controlled by setting the tidb_analyze_version global variable to control the version of the statistics and by using the ANALYZE TABLE command to manually execute the statistics collection task. For example:

-- Set the statistics version to 1
SET GLOBAL tidb_analyze_version=1;

-- Manually execute the statistics collection task
ANALYZE TABLE table_name;

Manual statistics collection can be executed on any TiDB Server, so you can specify the running server through manual statistics collection.