Do you have any recommendations for setting the server.grpc-memory-pool-quota parameter in TiDB 5.1?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 5.1 的server.grpc-memory-pool-quota 这个参数 有什么设置建议吗

| username: liuwenhe

【TiDB Environment】Production
【TiDB Version】5.1
【Encountered Issue】After TiKV OOM, it was identified that it might be due to gRPC accumulation. Therefore, I want to limit the maximum usage of gRPC by adjusting the parameter server.grpc-memory-pool-quota. Are there any considerations for this parameter? For example, my server has 32GB of memory, with 15GB allocated to block.cache. I plan to allocate 12GB to server.grpc-memory-pool-quota. Is this reasonable?
【Reproduction Path】What operations were performed that led to the issue
【Issue Phenomenon and Impact】

| username: cs58_dba | Original post link

If the temporary disk tmp-storage-quota is exceeded, an Out Of Global Storage Quota error may occur. Currently, the operators that support spilling to disk are: Sort, MergeJoin, HashJoin, and HashAgg. Memory statistics for other joins are not accurate yet, and spilling to disk is not supported for them.
https://docs.pingcap.com/zh/tidb/stable/configure-memory-usage/#数据落盘

| username: cs58_dba | Original post link

The official recommendation is to limit memory usage only in the event of an Out of Memory (OOM) situation. It is important to note that limiting memory usage may cause stuttering.

| username: liuwenhe | Original post link

It is indeed an OOM issue now, with high memory usage on the TiKV nodes. Currently, there are two approaches:

  1. Limit TiKV’s resource usage settings: Not sure if this is supported for an already set up TiDB cluster (no test environment, headache).
global:
  user: "tidb"
  resource_control:
    memory_limit: "2G"
  1. Limit gRPC memory usage by setting the following parameter:
tikv:
  server.grpc-memory-pool-quota
| username: cs58_dba | Original post link

In our current production design, each TiKV is configured on a physical machine with 512GB of memory, maximizing performance.

| username: liuwenhe | Original post link

You guys are impressive, hahaha, we only have 32GB.

| username: cs58_dba | Original post link

Only core large data volume systems can consider using TiDB. If the configuration is too low, its performance will be worse than centralized databases, making it meaningless to use.

| username: liuwenhe | Original post link

That’s true. Have you set the parameter server.grpc-memory-pool-quota? And for your 512GB physical machine, are you deploying a single instance?

| username: cs58_dba | Original post link

This is the current plan.

| username: liuwenhe | Original post link

It is recommended that a single instance does not exceed 2TB. Are you planning to use multiple instances for your 5TB TiKV setup? Or is it intended for backup purposes? Shouldn’t backups use shared disks?

| username: cs58_dba | Original post link

Complete planning, this storage plan can be adjusted later, initially only planned for 6 machines.

| username: liuwenhe | Original post link

For the backup and recovery plan of 10TB of data, you will need to mount the 10TB disk to each TiKV node using NFS, right?

| username: jansu-dev | Original post link

  1. As cs58_dba mentioned, “Those considering TiDB are usually core systems with large data volumes. If the configuration is too low, the performance will be worse than centralized databases.”

  2. The documentation also points out that “It should be noted that limiting memory usage may cause stuttering.” Is this tolerable for your business?

Currently, no related known bugs have been found on GitHub.

| username: cs58_dba | Original post link

Yes, it is generally adjusted manually and dynamically.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.