TiDB Chose the Wrong Index & Statistics Collection Failed: Query Execution Was Interrupted

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB走错索引&统计信息收集失败Query execution was interrupted

| username: jiaxin

[TiDB Usage Environment]
Production environment, TiDB version 6.1.5

[Problem Description]
On the afternoon of March 28th, a business application ran an update task, using the wrong index, causing the TiDB compute node to continuously experience OOM.


Upon investigation, the table’s statistics were at 55, which is not optimal, so we manually collected the table’s statistics;

mysql> SHOW STATS_HEALTHY where table_name like 'xxxx';
+----------------+------------------+----------------+---------+
| Db_name        | Table_name       | Partition_name | Healthy |
+----------------+------------------+----------------+---------+
| xxxx  | xxx  |                |      55 |
+----------------+------------------+----------------+---------+

Statistics version is 1
mysql>  show global variables like '%tidb_analyze_version%';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| tidb_analyze_version | 1     |
+----------------------+-------+
1 row in set (0.00 sec)

The table has approximately 6.6 billion rows, with data + index size of 2.4T

At 5:30 AM, the statistics collection failed with the error [executor:1317] Query execution was interrupted. The memory of the TiDB compute node that failed to collect statistics remained stable and not high.

In the morning, manually collected the specified joint index statistics of the table (in progress, not failed) ANALYZE TABLE xxx INDEX idx_standard_data_ct_vid_dt_sid;

Heatmap of the timeline not using the correct index

| username: tidb菜鸟一只 | Original post link

The table is too large, and the statistics cannot be collected. I suggest referring to https://docs.pingcap.com/zh/tidb/v5.4/statistics#全量收集. I think it should be collected proportionally. For example, WITH 0.1 SAMPLERATE only collects 10% of the data. Additionally, I suggest adjusting the concurrency of the analyze operation.

image

Increase it appropriately without affecting production applications.

| username: jiaxin | Original post link

Concurrency has already increased more than fourfold to 18 this morning :rofl:

mysql> show global variables like '%tidb_build_stats_concurrency%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| tidb_build_stats_concurrency | 18    |
+------------------------------+-------+
| username: WalterWj | Original post link

In fact, you can add bind to solve your problem, at least to make the execution plan correct first: CREATE [GLOBAL|SESSION] BINDING | PingCAP 文档中心

| username: tidb菜鸟一只 | Original post link

Reduce the sampling rate a bit, or directly use SQL hints or bind execution plans.

| username: 数据库真NB | Original post link

Reduce the concurrency, decrease the log collection frequency by 10 times, and give it a try.

| username: jiaxin | Original post link

  • The update has never been successfully executed, the application timed out and terminated, and it was not possible to create a binding through the historical execution plan.
  • Creating a forced index bind based on the SQL did not take effect, and it only started using the index normally after collecting the table index statistics.
mysql> ANALYZE TABLE xxxx_data INDEX idx_standard_xxxx;

Query OK, 0 rows affected (4 hours 16 min 57.41 sec)
| username: zhanggame1 | Original post link

It might be possible to upgrade the cluster version; later versions have made many optimizations to statistics collection.

| username: TIDB-Learner | Original post link

What is the maximum execution time configuration? It can also be appropriately increased.

| username: DBAER | Original post link

Was this screenshot feature self-developed or is it part of the enterprise edition?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.