How long does CLUSTER_STATEMENTS_SUMMARY_HISTORY record slow logs?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: CLUSTER_STATEMENTS_SUMMARY_HISTORY记录多长时间的慢日志?

| username: 逍遥_猫

[TiDB Usage Environment] v4, v6
[Overview] Query TOP 20 slow SQL
[Question] The CLUSTER_STATEMENTS_SUMMARY table records slow logs by digest. According to the official documentation, this table records data for half an hour. Logs older than half an hour are recorded in the CLUSTER_STATEMENTS_SUMMARY_HISTORY history table. The history table, based on the parameters tidb_stmt_summary_refresh_interval=1800 and tidb_stmt_summary_history_size=24, retains the last 12 hours of historical data. However, in both the CLUSTER_STATEMENTS_SUMMARY and CLUSTER_STATEMENTS_SUMMARY_HISTORY tables, I found:

MySQL [information_schema]> select min(SUMMARY_BEGIN_TIME) from CLUSTER_STATEMENTS_SUMMARY;
+-------------------------+
| min(SUMMARY_BEGIN_TIME) |
+-------------------------+
| 2022-05-28 16:30:00     |

select min(SUMMARY_END_TIME) from CLUSTER_STATEMENTS_SUMMARY;
+-----------------------+
| min(SUMMARY_END_TIME) |
+-----------------------+
| 2022-05-28 17:00:00   |
+-----------------------+
1 row in set (0.004 sec)

This time is the start time of the cluster service. So, aren’t these two tables collecting slow SQL from 2022-05-28 16:30:00? But the record counts in these two tables are different.

| username: h5n1 | Original post link

The image is not visible. Please provide the text you need translated.

| username: 逍遥_猫 | Original post link

tidb_stmt_summary_history_size=24 saves the history count of each type of SQL. I understand it as saving the most recent 24 occurrences of this type of SQL from the start of the cluster to the current time, not the 12 hours of historical data mentioned in the official documentation. Am I understanding this correctly?

| username: 逍遥_猫 | Original post link

Is this for each type by digest? Or for each SQL?

| username: h5n1 | Original post link

Its classification is based on the same sql_digest and plan_digest, indicating it is the same SQL. Since the records in statements_summary are aggregated every half hour (from SUMMARY_BEGIN_TIME to SUMMARY_END_TIME), there are 24 time periods in history, which equals 12 hours. statements_summary is cleaned up based on the number of entries, while history is cleaned up based on the number of half-hour time periods retained.

This design in TiDB doesn’t feel particularly user-friendly; Oracle does a better job in this regard. However, TiDB’s slow log is very detailed and records the execution plan.

| username: 逍遥_猫 | Original post link

“The ‘24 time periods in history’ means 12 hours, but querying ‘SELECT MIN(SUMMARY_BEGIN_TIME) FROM CLUSTER_STATEMENTS_SUMMARY;’ returns the SUMMARY_BEGIN_TIME as the cluster startup time. I don’t understand it. :joy:

| username: buddyyuan | Original post link

If your statements are fewer than 3000, this situation is normal. This is because the 12 hours is just the mandatory retention time, not the cleanup time. The actual cleanup happens when it exceeds 3000.

| username: 逍遥_猫 | Original post link

Got it, thanks everyone! :+1:

| username: 逍遥_猫 | Original post link

I found the original text: Statement Summary Tables | PingCAP 文档中心

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.