How the Value of Table Health is Calculated Using the TIDB show stats_healthy Command

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB show stats_healthy命令查看表健康度的值是如何计算的

| username: residentevil

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.7
[Encountered Problem: Problem Phenomenon and Impact] By executing show stats_healthy to check the health of the table, I want to know how this health value is calculated and which parameters can affect this calculation result.

| username: 大飞哥online | Original post link

When modify_count is greater than or equal to row_count, the health is 0;
When modify_count is less than row_count, the health is (1 - modify_count / row_count) * 100.
modify_count represents the number of modified rows in the table, and row_count represents the total number of rows in the table.

| username: residentevil | Original post link

This value changes very quickly, causing the online SQL execution plan to be inaccurate.

| username: 大飞哥online | Original post link

The health value of the table changes very quickly, which is probably due to the high frequency of table modifications.
You must have modified a lot of data, right?

| username: 大飞哥online | Original post link

You can adjust the frequency of statistics collection and reduce it.

| username: residentevil | Original post link

The main operations online are INSERT+DELETE, which are indeed related to data changes.

| username: 大飞哥online | Original post link

Then you can schedule the statistics to be collected during off-peak hours.

| username: 路在何chu | Original post link

Write a script to collect and analyze statistics in the middle of the night.

| username: 大飞哥online | Original post link

Without scripts, ANALYZE TABLE can be scheduled as a cron job.

| username: 大飞哥online | Original post link

By default, it can run at any time during the day, but it can be changed to run during off-peak business hours.

| username: Kongdom | Original post link

To be honest, writing a script is a bit safer, providing double protection.

| username: 路在何chu | Original post link

It will have problems after it automatically collects.

| username: 大飞哥online | Original post link

True, double insurance.

| username: 大飞哥online | Original post link

Then you can turn off automatic updates and use a script to handle it.

| username: 逍遥_猫 | Original post link

You can choose an optimized SQL statement based on experience or by testing with EXPLAIN ANALYZE, and bind the optimized SQL statement with the SQL statement executed by the business code through SQL BINDING.

| username: residentevil | Original post link

The ANALYZE TABLE operation takes a long time to execute because the sampled rows are usually around 10%. My cluster has a total of more than 4000 tables, and it takes a whole day to run through them once. :joy:

| username: 像风一样的男子 | Original post link

Write a script to regularly analyze commonly used tables that are frequently inserted and updated, and set a time period for automatic analysis for other tables.

| username: residentevil | Original post link

There is quite a big difference between TiDB and native MySQL in terms of how they collect statistics.

| username: Kongdom | Original post link

After all, one is a distributed database, and the other is a monolithic database.

| username: zhanggame1 | Original post link

There is also a big difference between the KV structure and row storage.