Analyze in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb的analyze

| username: TiDBer_kk6r8q3d

[Test Environment] Production Environment / Test / POC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

In the test environment, 126G memory, 40C, 3.6T disk (ordinary disk, not SSD).
PD V6.5.2
It appears that a table with 2.989 billion rows was created, containing 49 bigint (10 of which have values), 9 text (2 of which have values), 4 datetime (4 of which have values), 9 varchar (9 of which have values), and 2 decimal (2 of which have values).
Data creation has been stopped for 2 days. However, the table is analyzed daily, and each analyze table operation takes about 12 hours.
If automatic analyze is turned off, the table will not be analyzed in the future, which may lead to poor query performance;
If automatic analyze is not turned off, and there are multiple large tables in production, the system will spend every day analyzing tables.
Is it possible to re-analyze the table only when a certain percentage of insertions, deletions, or updates occur in a single table?

| username: tidb菜鸟一只 | Original post link

The parameter tidb_auto_analyze_ratio is used to control how much data change triggers automatic collection of statistics. Check what the value is, and also use SHOW ANALYZE STATUS to see if the automatic statistics collection task for this table fails every day. If automatic collection fails, it is recommended to manually collect the statistics:

SET tidb_build_stats_concurrency=16;
SET tidb_distsql_scan_concurrency=64;
ANALYZE TABLE table_name WITH 0.1 SAMPLERATE; -- Collect 10% of the statistics
| username: xfworld | Original post link

Manually executing ANALYZE TABLE and automatically executing ANALYZE TABLE have different degrees of parallelism, so there is a significant difference in execution speed.

If permissible, you can consider using AUTO ANALYZE TABLE along with auxiliary scripts to execute ANALYZE TABLE to meet different business scenarios.

If it is disabled, it will affect data collection, which may lead to deviations in the execution plan. Of course, you can also use hints to directly bypass the execution plan…

It is recommended to test based on your actual situation to see which method is more suitable.

| username: redgame | Original post link

tidb_auto_analyze_ratio: This parameter indicates the percentage of modified rows in a table that will trigger an automatic table analysis task. The default value is 0.5, meaning that automatic analysis is triggered when the number of modified rows exceeds 50% of the total rows.

| username: TiDBer_kk6r8q3d | Original post link

Fail_reason: [tikv:1317]Query execution was interrupted
Alright, it seems that the recent errors I’ve seen are all this one, indicating that information collection failed.

| username: h5n1 | Original post link

Warning
Locking statistics is currently an experimental feature and is not recommended for use in production environments.

Starting from v6.5.0, TiDB introduced the feature of locking statistics. Once the statistics of a table are locked, they cannot be modified, and the ANALYZE operation cannot be performed on that table. The usage is as follows:
mysql> lock stats t;

| username: cassblanca | Original post link

There is a parameter that can control the threshold of table data modification rate before triggering the statistics collection operation.

| username: 郑旭东石家庄 | Original post link

I am using V7.5.1, with tidb_auto_analyze_ratio set to 0.5. However, based on the frequency of scans and analyses observed through the management platform, my table data changes should be in the millions, and the data should not reach a 50% change daily, yet it still performs the operation.

| username: zhang_2023 | Original post link

You can set the percentage through parameters.

| username: zhh_912 | Original post link

Is this a re-table?

| username: FutureDB | Original post link

Given the current efficiency of TiDB in collecting statistical information, it is more reliable to use Hints to bind the execution plan when dealing with such large tables. If the optimizer chooses an incorrect execution plan due to statistical information collection or other reasons, the impact on queries for such large tables can be quite significant.

| username: zhaokede | Original post link

Automatically executing ANALYZE TABLE is scheduled based on the system’s performance conditions and does not consume much performance.

| username: yytest | Original post link

You can check the data modification rate parameter and try adjusting it.

| username: 舞动梦灵 | Original post link

That’s right. My table has over a billion rows. It’s impossible for it to collect data only after seven or eight hundred million changes.