Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb-server 内存占用过高时的报警
[TiDB Usage Environment] Production Environment
[TiDB Version] v5.3.4
[Reproduction Path] None
[Encountered Problem: Phenomenon and Impact] There are only 4 physical machines, each with 300G memory, with mixed deployment of tidb, tikv, pd. The memory allocated to tidb-server is 64G. When the issue occurred, 3 tidb-servers crashed almost simultaneously. The var/log/message contains oom-killer, but there is no oom-killer log in tidb.log. How can I modify the default settings to trigger an alert when tidb-server’s memory usage reaches 80% of the  limit, and record information such as goroutine, heap, running_sql, etc.?
             
            
              
              
              
            
           
          
            
            
              
Didn’t you find the parameters that need to be modified? Is there any other problem?
             
            
              
              
              
            
           
          
            
            
              
By default, the tidb-server instance will print alarm logs and record related log files when the machine memory usage (the machine memory is 300G) reaches 80% of the total memory. I want the tidb-server instance to print alarm logs and record related log files when the “instance memory” (the memory set for tidb-server is 64G) usage reaches 80%. Is there a way to do this?
             
            
              
              
              
            
           
          
            
            
              
When the memory threshold alarm function is enabled, if the configuration item server-memory-quota is not set, the memory alarm threshold is memory-usage-alarm-ratio * system memory size; if server-memory-quota is set and greater than 0, the memory alarm threshold is memory-usage-alarm-ratio * server-memory-quota. You should set the server-memory-quota for tidb-server first.
             
            
              
              
              
            
           
          
            
            
              
If server-memory-quota is set and greater than 0, the memory alarm threshold is memory-usage-alarm-ratio * server-memory-quota.
So it should be like this now, isn’t it alarming?
             
            
              
              
              
            
           
          
            
            
              
64G*0.8=51.2G memory. Why don’t you set it to 17% of 300?
             
            
              
              
              
            
           
          
            
            
              
Can the server-memory-quota parameter be set in the production environment of version 5.3.4? Are there any issues?
             
            
              
              
              
            
           
          
            
            
              
Because it is a mixed deployment, I am not sure if TiKV or other components occupying memory will cause TiDB to continuously log, and goroutine, heap, and running_sql are also not ideal.
             
            
              
              
              
            
           
          
            
            
              
The official documentation has detailed explanations, take a look:
             
            
              
              
              
            
           
          
            
            
              
Has anyone encountered issues related to server-memory-quota in production? Looking to gather some experience.
             
            
              
              
              
            
           
          
            
            
              
If you upgrade to version 6.4, you can use the system variable tidb_server_memory_limit to set the maximum memory usage for TiDB. Both percentage and specific size can be set.
             
            
              
              
              
            
           
          
            
            
              
The mem-quota-query parameter is used to limit the memory usage of a single query. You can adjust the value of this parameter according to the actual situation to control the memory consumption of a single query.