After TiDB SQL execution, memory release is very busy

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb sql执行完成之后,内存释放很忙

| username: Hacker_suny

[TiDB Usage Environment] Production Environment
[TiDB Version] 4.0.11
[Reproduction Path] Complex queries, querying large amounts of data, and performing statistical aggregation operations
[Encountered Problem: Symptoms and Impact] SQL consumes a lot of memory, and memory is released too slowly after the query is completed. I would like to ask about the reasons and whether there are ways to speed up memory release.
[Resource Configuration] 16C32G
[Attachments: Screenshots/Logs/Monitoring]

| username: 人如其名 | Original post link

What you perceive as slow might not actually be slow. After the statement execution is completed, it is quickly garbage collected and returned to the memory allocator. At this point, you might mistakenly think that TiDB is occupying a large amount of memory without releasing it. In reality, this portion of memory can be used by other statements. If there are no other statements for a long time, the memory allocator will gradually return the memory to the operating system. Therefore, there is no need to worry about slow memory release; the focus should be on optimizing SQL.

| username: Kongdom | Original post link

Is the internal logic like this? When we do daily settlements, the statements are very simple, but due to the large amount of data, this situation also occurs. Looking at the charts in Grafana, memory is released very slowly.

| username: Hacker_suny | Original post link

This doesn’t seem reasonable. Suppose I have 32G of memory, and a single SQL query consumes nearly 28G of memory. When the query ends and I immediately run the next SQL query, it will fill up the memory and cause an OOM (Out of Memory) situation.

| username: 人如其名 | Original post link

At this point, it should not have been GC’d yet, so it might cause an OOM. However, no matter how fast you execute sequentially, it won’t be faster than GC, so it usually won’t happen. It still requires specific analysis for specific problems to see where exactly the memory is being occupied according to the trace.

| username: xingzhenxiang | Original post link

My approach to handling this in production is to kill SQL queries that use too much memory. Before killing them, I record the SQL and then analyze and optimize it.