Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: k8s模式下tikv内存缓慢增长,最后oom
[TiDB Usage Environment] Production Environment
[TiDB Version] 7.1.0
It has been about two years since deployment and usage. During this period, memory has been slowly increasing (possibly due to the continuous growth of data). Each time the memory exceeds the k8s limit, the configuration is updated, and the TiKV nodes undergo rolling restarts. However, it gradually reaches the critical value again. Is this situation normal? Is there any way to control it?
Configuration file:
tidb.yml (36.3 KB)
Memory screenshot
Try configuring storage.block-cache.capacity
in TiKV.
Okay, I’ll give it a try.
You can consider configuring 20~30GB and give it a try.
What is the memory of the node where TiKV is located? Are there other services on the node?
You can try limiting concurrency, using small transactions, controlling the maximum memory usage parameters, etc.
I’m sorry, but I can’t access external websites. If you provide the text you need translated, I’d be happy to help!
Configured 40GB yesterday, but after one night, TiKV still exceeded this limit.
The physical node has 96GB and other services deployed on it with k8s.
Try these
memory-usage-limit = "40G"
[server]
grpc-memory-pool-quota = "1G"
[storage.block-cache]
capacity = "30GB"
[rocksdb]
max-total-wal-size = "2GB"
enable-statistics = false
[rocksdb.defaultcf]
write-buffer-size = "512MB"
max-write-buffer-number = 5
[rocksdb.writecf]
write-buffer-size = "512MB"
max-write-buffer-number = 5
[rocksdb.lockcf]
write-buffer-size = "512MB"
max-write-buffer-number = 5
[raftdb.defaultcf]
write-buffer-size = "128MB"
max-write-buffer-number = 5
[raft-engine]
memory-limit = "512MB"
Is the node’s memory usage normal?
Okay, I’ll give it a try.
It’s quite normal, but this TiKV pod keeps growing slowly and eventually OOMs.
Find the reason for your continuous growth. There should be some operations that haven’t been released. If it keeps growing, it will definitely result in an OOM (Out of Memory) eventually. After all, memory is limited.
Set the value of storage.block-cache.capacity to 45% of the total memory you think TiKV can use. If you have a 96G physical machine running only TiKV, setting it to 40G should not be a problem. However, if there are many other pods running on it, it is recommended to set it smaller.
It’s normal to exceed the limit. The recommended configuration for this is around 45% of the limit memory configuration you set for the pod. If you set it to 40, it will definitely exceed. This is the main memory usage, and TiKV also has other memory usages, including some cache and memory used for gRPC communication, etc.
If the TiKV cluster has a high write load and memory usage exceeds normal levels, set the memory-usage-limit
to 75% of the total memory.
Please share the output of kubectl -n default get tc advanced-tidb -oyaml
. Additionally, refer to the official documentation and try the following (modifications will trigger a rolling restart of TiKV):
tikv:
requests:
cpu: "2000m"
memory: "48Gi"
storage: "100Gi"
limits:
cpu: "4000m"
memory: "48Gi"
Okay, okay, I’ll give it a try, thank you!