Loss of Statistics Causing Index Usage Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 统计信息丢失,造成索引使用问题

| username: TiDBer_Zw6TA6E3

[TiDB Usage Environment: Production Environment]
[TiDB Version: Version 6.5]
[Encountered Issue: Data statistics information inexplicably disappears and only reappears after manually running ANALYZE TABLE xxx. This leads to incorrect index usage, causing severe resource consumption in the cluster and even making it unusable. PS: After running for a certain period (e.g., normal in the afternoon, issues arise at midnight), SHOW STATS_HEALTHY is empty until the ANALYZE TABLE statement is manually rerun.]
[Resource Configuration: TIDB server + PD server 16 cores 32GB, TIKV * 3 32 cores 64GB]

| username: Billmay表妹 | Original post link

Here are some possible reasons and solutions:

  1. TiDB automatic statistics feature is not enabled or configured incorrectly: TiDB has automatic statistics enabled by default, but if your TiDB version is old or configured incorrectly, it may cause the automatic statistics feature to be disabled or not work properly. You can check whether the automatic statistics feature is enabled in the TiDB configuration file and whether the related configuration parameters are correct.
  2. The frequency of automatic statistics updates is not high enough: TiDB’s default frequency for automatic statistics updates is relatively low, which may lead to statistics loss. You can try adjusting the relevant parameters in the TiDB configuration file to increase the frequency of automatic statistics updates.
  3. Manual statistics updates are not timely: If there are a lot of data changes in your TiDB cluster, it may cause the statistics to not be updated in time. You can try manually updating the statistics to ensure their timeliness.

You can also check the following content:

| username: TiDBer_Zw6TA6E3 | Original post link

Question 1: If automatic ANALYZE is enabled and the current ANALYZE version is 2, will an OOM (Out of Memory) during the ANALYZE process on a table cause SHOW STATS_HEALTHY to be empty?

Question 2: The core question is why STATS_HEALTHY becomes empty after running for a period of time, and only reappears after manually running ANALYZE TABLE (after which the cluster usage returns to normal). This doesn’t seem to be related to automatic ANALYZE, because even if I manually execute ANALYZE, why does it disappear again after a period of time?

| username: Billmay表妹 | Original post link

If the TiDB automatic ANALYZE feature is enabled and the ANALYZE version is 2, an OOM (Out of Memory) during the ANALYZE process may cause SHOW STATS_HEALTHY to be empty. This is because during the ANALYZE process, TiDB will occupy a certain amount of memory resources, and if the memory resources are insufficient, it may cause ANALYZE to fail or OOM, resulting in SHOW STATS_HEALTHY being empty.

Additionally, if the statistics of a table are lost or inaccurate, it may cause SHOW STATS_HEALTHY to be empty. In this case, you can manually run the ANALYZE TABLE statement to update the statistics to ensure the accuracy of SHOW STATS_HEALTHY.

In summary, SHOW STATS_HEALTHY being empty may be due to various reasons, including lost statistics, ANALYZE failure, or OOM, among others.

| username: TiDBer_Zw6TA6E3 | Original post link

Okay, thank you! I have a table that is indeed very large. I will try to remove this table to avoid the OOM issue during ANALYZE and then observe the results. Thanks again!

| username: Billmay表妹 | Original post link

In a production environment, it is best to deploy PD and TiDB separately.

I’m not sure if the TiDB server and PD server with 16 cores and 32GB are deployed and operated on the same server. If you have performance requirements, you might need to deploy them separately.

| username: Billmay表妹 | Original post link

Got it, I hope you can solve this problem~

| username: TiDBer_Zw6TA6E3 | Original post link

Is there any way to support viewing the list of SQLs being executed when the CPU of the current TIDB server exceeds 40%? This would be useful for troubleshooting which SQLs are causing the CPU spike. Thank you!

| username: Billmay表妹 | Original post link

You can view the longest-running SQL statements in the current TiDB cluster, along with their execution plans and statistics, through the Top SQL page in TiDB Dashboard. Additionally, TiDB Dashboard provides tools like CPU Profile and Flame Graph, which can help you analyze the execution of SQL statements more deeply and identify the causes of CPU spikes.

The specific steps are as follows:

  1. Open TiDB Dashboard and go to the Top SQL page.
  2. Select the time range and SQL type you want to view at the top of the page.
  3. View the longest-running SQL statements and click the “View Execution Plan” button to see the execution plan and statistics of the SQL statements.
  4. If you need to analyze the execution of SQL statements more deeply, you can use the CPU Profile and Flame Graph tools. On the Top SQL page, click the “CPU Profile” or “Flame Graph” button next to the SQL statement to open the corresponding tool interface.

In the CPU Profile and Flame Graph tool interfaces, you can see the CPU consumption of each function during the execution of the SQL statement and the calling relationships between functions. By analyzing this information, you can identify the causes of CPU spikes and make corresponding optimizations.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.