Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb 无法限制住内存
Version: TiDB 4.0.12
Background:
Currently, TiDB is deployed together with other databases. The memory parameter for the TiDB server is set to server-memory-quota 24GB, and the maximum memory parameter for a single SQL query is set to tidb_mem_quota_query 1GB. It has been running smoothly with memory usage not exceeding 10GB. However, suddenly on January 18th from 18:20 to 18:35, the memory usage spiked to nearly 200GB, causing the entire machine to crash and affecting other database services.
Monitoring during that period:

Slow log information during that period:
The tidb.log contains expensive_query entries. I randomly checked some SQL queries [mem_max=“5738940926 Bytes (5.34 GB)”].
My questions:
Why did the TiDB server instance memory usage reach 200GB even though I set the memory limit parameter to 24GB? Also, why did the log show SQL queries exceeding the 1GB memory limit despite setting the maximum memory for a single SQL query?
The memory statistics in this area don’t seem to be very accurate for some hash joins and such. It feels like this is the reason…
Also, since your deployment is a mixed deployment, first check the memory usage of each component to confirm whether the high memory usage of the tidb-server is causing the issue.
Confirmed that it is occupied by the TiDB component.
How to solve this? How to upgrade the version?
Or are there any other ways to limit the memory usage of the TiDB server?
It feels like the version is relatively low. You might want to consider trying to upgrade the version.
After looking at some cases, it should not be caused by the TiDB version. Some people have encountered it in version 5.x as well. It is possible that some large queries can bypass the tidb_mem_quota_query parameter settings.
If deploying in a mixed environment, it is recommended to bind TiDB using NUMA, at least to avoid affecting other components.
The tidb_server_memory_limit
parameter in TiDB version 6.5 seems to address the OOM issue of the TiDB server.
The official documentation states:
System Variables | PingCAP Documentation Center
The official documentation also includes troubleshooting methods for TiDB OOM issues:
TiDB OOM Troubleshooting | PingCAP Documentation Center
You can check if it is useful for you.
Yes, the lower version is preparing to bind the core.
Follow the latest live broadcast explanation
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.