Why does the statements_summary_history table retain historical records for too long?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 为什么statements_summary_history表会保留过长的历史记录

| username: zzw6776

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5

According to the documentation, this table should only retain 12 hours of data when all values are kept at their default settings. The console display should also only retain 12 hours.

However, the actual data in the table can be traced back to March.

Is this a bug or is the documentation unclear?

| username: tidb菜鸟一只 | Original post link

How many records are there in this table?

| username: zzw6776 | Original post link

More than 4 weeks, the data from March has continued until now.

| username: zhanggame1 | Original post link

The documentation says, “To ensure the timeliness of monitoring metrics, the data in statements_summary is regularly cleared and only shows aggregated results from a recent period. The clearing interval is set by the system variable tidb_stmt_summary_refresh_interval. If a query is made right after the clearing, the displayed data might be minimal.”

It doesn’t mention clearing statements_summary_history. Since it’s called history, it should retain a longer historical record.

| username: zzw6776 | Original post link

Read the entire document.
After the above configuration takes effect, statements_summary will be cleared every 30 minutes, so statements_summary_history saves the history of the last 12 hours. statements_summary_evicted saves the records of the last 24 time periods where eviction occurred; statements_summary_evicted uses 30 minutes as a recording cycle, with a table capacity of 24 time periods.

| username: tidb菜鸟一只 | Original post link

SHOW GLOBAL VARIABLES LIKE ‘%tidb_stmt%’;
Take a look at these parameters. My understanding is that only when the value exceeds tidb_stmt_summary_max_stmt_count will it be deleted based on time.

| username: zhanggame1 | Original post link

I was mistaken, but I checked two TiDB clusters, and the data in this table has been there for many days, definitely not deleted after 12 hours.

| username: zzw6776 | Original post link

Yes, it’s the same in both of my environments. It seems that there are very few records saved beyond 12 hours, only a few dozen. However, within the 12-hour time frame, there are hundreds of records.

I guess either the documentation is wrong, or there’s a bug.

| username: zhanggame1 | Original post link

The default value of tidb_stmt_summary_max_stmt_count is 3000.

I have 4890 entries here.

| username: 小王同学Plus | Original post link

This should be a case of the documentation not being clear.

If there is no load for a long period of time, there might be data from a long time ago. The accurate description should be that for each type of SQL statement, data from the most recent 24 time windows (each window is half an hour by default) is saved. However, if a certain SQL statement appeared half a month ago and hasn’t appeared much since, and the total number of SQL statements in the cluster does not exceed tidb_stmt_summary_max_stmt_count (default is 3000), then data from half a month ago can still be found.

If convenient, you can suggest a modification to the official documentation~

| username: redgame | Original post link

Change tidb_stmt_summary_max_stmt_count

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.