TiDB Node and TiKV Node OOM

translator_bot · July 9, 2024, 8:05am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb节点、tikv节点oom

| username: 胡杨树旁

In the test environment, it was found that the TiDB nodes and TiKV nodes frequently restart. Relevant information from the system logs:

translator_bot · August 19, 2024, 12:34am

| username: Ming | Original post link

For hybrid deployments, try to set resource limits, such as manually setting and lowering the block-cache for TiKV.

translator_bot · August 19, 2024, 12:34am

| username: TiDBer_7S8XqKfl-1158 | Original post link

storage.block-cache.capacity: Adjust the capacity of the storage layer block cache.
raftdb.rate-limiter.capacity: Adjust the write speed limit of RaftDB.
grpc-concurrency: Adjust gRPC concurrency to reduce memory usage.
server.grpc-concurrent-streams: Adjust the maximum number of concurrent gRPC streams.
Try adjusting the above parameters.

translator_bot · August 19, 2024, 12:34am

| username: Jasper | Original post link

This is an OOM issue. You can refer to the documentation to configure the relevant parameters.

TiDB:

TiKV:

translator_bot · August 19, 2024, 12:34am

| username: MrSylar | Original post link

Also consider optimizing the SQL.

translator_bot · August 19, 2024, 12:34am

| username: tony5413 | Original post link

TiDB and TiKV need to allocate memory reasonably.
If the memory is too low, it needs to be expanded.
Check if there are other processes on the server that are using a lot of memory.
Is it caused by executing some SQL?

translator_bot · August 19, 2024, 12:34am

| username: 胡杨树旁 | Original post link

TiDB and TiKV are mixed, with 2 TiKV instances and 2 TiDB instances on each server. The block-cache was changed from 128G to 50G, but there are still restart issues after the modification. Should it be reduced further? The machine configuration is 512G and 128 cores.

translator_bot · August 19, 2024, 12:34am

| username: 胡杨树旁 | Original post link

I checked the SQL during the restart period, and the maximum memory usage did not exceed 1GB.

translator_bot · August 19, 2024, 12:34am

| username: Ming | Original post link

Is TiKV still experiencing kill situations? Please share some monitoring graphs.

Overview → System info → Memory Available
TiDB → Server → Memory Usage
TiKV-Details → Cluster → Memory
TiDB-Runtime → OOM Nodes → Memory Usage
Also, check if NUMA is bound, and if so, how it is bound.

translator_bot · August 19, 2024, 12:34am

| username: TiDBer_tvqzG8Dk | Original post link

Memory overflowed.

translator_bot · August 19, 2024, 12:34am

| username: FutureDB | Original post link

TiDB and TiKV nodes should preferably not be deployed together, as this can easily lead to memory pressure and cause processes with high memory usage to be killed.

translator_bot · August 19, 2024, 12:34am

| username: 濱崎悟空 | Original post link

Mixed deployment is also not good.

translator_bot · August 29, 2024, 2:15am

| username: Hacker_zuGnSsfP | Original post link

It seems like the stack overflowed.