Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb-server 内存占用过高时的报警
[TiDB Usage Environment] Production Environment
[TiDB Version] v5.3.4
[Reproduction Path] None
[Encountered Problem: Phenomenon and Impact] There are only 4 physical machines, each with 300G memory, with mixed deployment of tidb, tikv, pd. The memory allocated to tidb-server is 64G. When the issue occurred, 3 tidb-servers crashed almost simultaneously. The var/log/message contains oom-killer, but there is no oom-killer log in tidb.log. How can I modify the default settings to trigger an alert when tidb-server’s memory usage reaches 80% of the limit, and record information such as goroutine
, heap
, running_sql
, etc.?
Didn’t you find the parameters that need to be modified? Is there any other problem?
By default, the tidb-server instance will print alarm logs and record related log files when the machine memory usage (the machine memory is 300G) reaches 80% of the total memory. I want the tidb-server instance to print alarm logs and record related log files when the “instance memory” (the memory set for tidb-server is 64G) usage reaches 80%. Is there a way to do this?
When the memory threshold alarm function is enabled, if the configuration item server-memory-quota
is not set, the memory alarm threshold is memory-usage-alarm-ratio * system memory size
; if server-memory-quota
is set and greater than 0, the memory alarm threshold is memory-usage-alarm-ratio * server-memory-quota
. You should set the server-memory-quota for tidb-server first.
If server-memory-quota
is set and greater than 0, the memory alarm threshold is memory-usage-alarm-ratio * server-memory-quota
.
So it should be like this now, isn’t it alarming?
64G*0.8=51.2G memory. Why don’t you set it to 17% of 300?
Can the server-memory-quota parameter be set in the production environment of version 5.3.4? Are there any issues?
Because it is a mixed deployment, I am not sure if TiKV or other components occupying memory will cause TiDB to continuously log, and goroutine
, heap
, and running_sql
are also not ideal.
The official documentation has detailed explanations, take a look:
Has anyone encountered issues related to server-memory-quota in production? Looking to gather some experience.
If you upgrade to version 6.4, you can use the system variable tidb_server_memory_limit
to set the maximum memory usage for TiDB. Both percentage and specific size can be set.
The mem-quota-query
parameter is used to limit the memory usage of a single query. You can adjust the value of this parameter according to the actual situation to control the memory consumption of a single query.