Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 如何排查Linux slab_unreclaimable内存占用高的原因?
Hello, my production machine with 200G memory has deployed other distributed file storage system clients. I found that the memory usage reached 150G, and even after killing the process, the memory could not be released. The machine must be restarted to resolve this issue.
When searching for top memory usage, I found that slabtop SUnreclaim occupied 135G, with task_struct occupying 130G. How can this be resolved?
Other services cannot be deployed.
Slab_unreclaimable memory is system memory that cannot be reclaimed. When its proportion of total memory is too high, it will affect available memory and system performance. This article introduces how to troubleshoot the high usage of Linux slab_unreclaimable memory.
Problem Phenomenon
When running the command cat /proc/meminfo | grep "SUnreclaim"
on a Linux instance to check the SUnreclaim parameter, it is found that the memory is quite large (e.g., SUnreclaim: 6069340 kB
). When this memory exceeds 10% of the system’s total memory, there may be a slab memory leak.
Possible Causes
Slab memory is memory requested by kernel components (or drivers) through kmalloc-like interfaces from the buddy system, and then not properly released by the kernel components (or drivers). Once a slab memory leak occurs in an instance and the memory cannot be reclaimed by killing the process, the only solution is to restart the instance.
Slab memory leaks will lead to reduced available memory for business operations on the instance, memory fragmentation, and may also trigger the system OOM Killer and cause system performance jitter.