Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 某tikv节点频繁oom,但服务器内存是充足的
【TiDB Usage Environment】Production Environment
【TiDB Version】v4.0.9
【Encountered Problem: Phenomenon and Impact】
A certain TiKV node in the cluster frequently encounters OOM, but the server’s memory is sufficient. What could be the issue?
【Resource Configuration】
Because I didn’t deploy the cluster, I’m afraid there might be issues with the upgrade~~ So I’ve been delaying and haven’t dared to upgrade.
SET tidb_mem_quota_query = 8 << 30;
I still don’t quite understand this parameter. Could you please explain it in detail?
Didn’t bind the core, right?
So, I have 8G now, right? Is this how I should look at it?
The key point is that my physical machine still has 60G of available memory. Even if this reaches 8G, it shouldn’t OOM, right? I don’t understand.
When everything is normal, the available memory is 60G. When it crashes, the memory is released, and the available memory reaches 120G. Is there any problem with this?
Sorry, I mistook it for TiKV-Details’ memory at that time. 
Haha, thanks for participating~~ 
This is from my message log, can anyone understand this? How much memory did I use? It got killed. 87G?
Could you send the configuration file? Did you use Cgroup to limit the memory?
Check the topology file to confirm if NUMA binding is enabled. If it is, only a portion of the memory can be used.
I didn’t see any cgroup-related configuration.
What is a topology file? How do you view it?
Now you can see numa_node = ‘1’ in the configuration file. You can check the size of node 1 by running numactl --hardware. It should not exceed 60G. Then, in the second message above, combining with anon-rss, it should be confirmed that it is a CPU binding issue.
You can take a look at the reply above. I estimate that the usable memory for your numa_node1 is only 60G. If you want to use more, you can avoid binding the cores to numa, comment out the related configuration, or bind multiple numa nodes, something like this numa_node: “0,1”.
tiup cluster edit-config <cluster-name>
<cluster-name>
represents the name of the cluster to be operated on.
In the opened file, check if there are any NUMA-related settings for TiKV.
If you don’t want to OOM, adjust the sizes of various blockcaches and memtables.
https://docs.pingcap.com/zh/tidb/stable/tune-tikv-memory-performance#tikv-内存参数性能调优
Check out this article.
The underlying layer of TiDB is RocksDB, which has 4 CFs. Each CF has a memtable (equal to write-buffer-size * max-write-buffer-number) and a blockcache (corresponding to [storage.block-cache]). Adjusting these to be smaller will reduce memory usage.