How to troubleshoot high memory usage of Linux slab_unreclaimable?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何排查Linux slab_unreclaimable内存占用高的原因?

| username: 程序员小王

Hello, my production machine with 200G memory has deployed other distributed file storage system clients. I found that the memory usage reached 150G, and even after killing the process, the memory could not be released. The machine must be restarted to resolve this issue.

When searching for top memory usage, I found that slabtop SUnreclaim occupied 135G, with task_struct occupying 130G. How can this be resolved?

Other services cannot be deployed.

Slab_unreclaimable memory is system memory that cannot be reclaimed. When its proportion of total memory is too high, it will affect available memory and system performance. This article introduces how to troubleshoot the high usage of Linux slab_unreclaimable memory.

Problem Phenomenon

When running the command cat /proc/meminfo | grep "SUnreclaim" on a Linux instance to check the SUnreclaim parameter, it is found that the memory is quite large (e.g., SUnreclaim: 6069340 kB). When this memory exceeds 10% of the system’s total memory, there may be a slab memory leak.

Possible Causes

Slab memory is memory requested by kernel components (or drivers) through kmalloc-like interfaces from the buddy system, and then not properly released by the kernel components (or drivers). Once a slab memory leak occurs in an instance and the memory cannot be reclaimed by killing the process, the only solution is to restart the instance.

Slab memory leaks will lead to reduced available memory for business operations on the instance, memory fragmentation, and may also trigger the system OOM Killer and cause system performance jitter.

| username: Icemap | Original post link

Hello, under what circumstances did this happen? Are you deploying TiDB, TiKV, or PD?
And what file storage system client is mixed in?

Also, I see that you seem to have copied content from an article here:

slab_unreclaimable memory is system unreclaimable memory. When its proportion of total memory is too high, it will affect available memory and system performance. This article introduces how to troubleshoot the reasons for high Linux slab_unreclaimable memory usage.

Can you provide the link to the article?

| username: 程序员小王 | Original post link

This is to deploy the BeegFS storage client, not to deploy the TiDB service, because there are issues with the service and we don’t dare to deploy TiDB. Search for how to troubleshoot the high memory usage of Linux slab_unreclaimable?

| username: Jiawei | Original post link

| username: 程序员小王 | Original post link

Today, I verified that writing a shell script to continuously execute a command does not increase the number of tasks, but executing the BeeGFS client command does execute BeeGFS-related information.

  1. Using slabtop to check, task_struct occupies 130G and is increasing.
    Note that this uses the IB network Release Notes v7.3.1 — BeeGFS Documentation 7.3.1

  2. The BeeGFS client consists of a kernel module and two system services.

| username: 程序员小王 | Original post link

The default value of tidb_enable_clustered_index is INT_ONLY. This means that the clustered index is enabled by default only for tables with an integer primary key.

| username: 程序员小王 | Original post link

The issue has been resolved, it was related to the IB driver version.

| username: Billmay表妹 | Original post link

Is it possible to check the BeegFS-related forums for help?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.