TiDB OOM Causes Restart

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB OOM导致重启

| username: TiDBer_qqabzOs3

[TiDB Usage Environment]
Production Environment
[TiDB Version]
v6.4.0
[Encountered Problem]
TiDB experienced a sudden increase in memory usage, triggering an OOM event. Upon checking the slow logs, it was identified that an internal SQL execution caused this. The SQL content is as follows:
SELECT
HIGH_PRIORITY table_id,
is_index,
hist_id,
distinct_count,
version,
null_count,
cm_sketch,
tot_col_size,
stats_ver,
correlation,
flag,
last_analyze_pos
FROM
mysql.stats_histograms;

What is the meaning of this SQL, under what circumstances does it occur, and how can this issue be avoided?

| username: hey-hoho | Original post link

This is the histogram information for table lookup, related to statistics. The execution frequency of this SQL is not very high, and the memory usage is not large, so it should not be the cause of the OOM. Let’s look for other SQLs.

| username: buddyyuan | Original post link

Generally, the memory-usage-alarm-ratio is set. When this threshold is exceeded, it will dump the goroutine, heap, and running_sql from memory to a file. You can check these files to see what exactly is causing the issue.

| username: tidb菜鸟一只 | Original post link

It shouldn’t be this SQL causing the issue. This SQL is used for collecting statistical information. I suggest checking other SQL queries.

| username: TiDBer_qqabzOs3 | Original post link

However, from the slow query log, comparing the OOM time, only this statement matches. Apart from this SQL consuming hundreds of MB of memory, all other SQL statements are in KB.

| username: TiDBer_qqabzOs3 | Original post link

This is deployed on K8S. TiDB is stateless, and the logs are gone after a restart.

| username: TiDBer_qqabzOs3 | Original post link

However, from the slow query logs, only this SQL query has the highest memory usage, nearly 300MB, while the largest memory usage for other SQL queries is less than 50MB. Additionally, the OOM (Out of Memory) event timing also matches only this SQL query.

| username: jansu-dev | Original post link

As buddyyuan mentioned, it is still necessary to look at the behavior of the heap and log to see if it matches the invocation of this SQL. Currently, it is suspected that the OOM is related to this SQL, but it cannot be proven.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.