TiDB Node and TiKV Node OOM

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb节点、tikv节点oom

| username: 胡杨树旁

In the test environment, it was found that the TiDB nodes and TiKV nodes frequently restart. Relevant information from the system logs:



| username: Ming | Original post link

For hybrid deployments, try to set resource limits, such as manually setting and lowering the block-cache for TiKV.

| username: TiDBer_7S8XqKfl-1158 | Original post link

  • storage.block-cache.capacity: Adjust the capacity of the storage layer block cache.
  • raftdb.rate-limiter.capacity: Adjust the write speed limit of RaftDB.
  • grpc-concurrency: Adjust gRPC concurrency to reduce memory usage.
  • server.grpc-concurrent-streams: Adjust the maximum number of concurrent gRPC streams.
    Try adjusting the above parameters.
| username: Jasper | Original post link

This is an OOM issue. You can refer to the documentation to configure the relevant parameters.

TiDB:

TiKV:

| username: MrSylar | Original post link

Also consider optimizing the SQL.

| username: tony5413 | Original post link

  1. TiDB and TiKV need to allocate memory reasonably.
  2. If the memory is too low, it needs to be expanded.
  3. Check if there are other processes on the server that are using a lot of memory.
  4. Is it caused by executing some SQL?
| username: 胡杨树旁 | Original post link

TiDB and TiKV are mixed, with 2 TiKV instances and 2 TiDB instances on each server. The block-cache was changed from 128G to 50G, but there are still restart issues after the modification. Should it be reduced further? The machine configuration is 512G and 128 cores.

| username: 胡杨树旁 | Original post link

I checked the SQL during the restart period, and the maximum memory usage did not exceed 1GB.

| username: Ming | Original post link

Is TiKV still experiencing kill situations? Please share some monitoring graphs.

  1. Overview → System info → Memory Available
  2. TiDB → Server → Memory Usage
  3. TiKV-Details → Cluster → Memory
  4. TiDB-Runtime → OOM Nodes → Memory Usage
    Also, check if NUMA is bound, and if so, how it is bound.
| username: TiDBer_tvqzG8Dk | Original post link

Memory overflowed.

| username: FutureDB | Original post link

TiDB and TiKV nodes should preferably not be deployed together, as this can easily lead to memory pressure and cause processes with high memory usage to be killed.

| username: 濱崎悟空 | Original post link

Mixed deployment is also not good.

| username: Hacker_zuGnSsfP | Original post link

It seems like the stack overflowed.