What are the parameters for limiting TiKV instance memory usage?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于限制TiKV实例内存使用的参数都有哪些?

| username: OnTheRoad

  1. memory-usage-high-water parameter
    No official explanation for this configuration parameter was found in the official documentation. In the blog post “Summary of TiKV Main Memory Structure and OOM Troubleshooting” and ASKTUG “TiKV Memory Usage Too High”, it is described that this parameter can limit the maximum memory that TiKV can use, which defaults to 90% of physical memory. Additionally, show config where type='tikv' and name like '%memory%' can be used to see that the default value of this parameter is 0.9.

  2. memory-usage-limit parameter
    No official explanation for this configuration parameter was found in the official documentation. It is only briefly introduced in the TiDB v5.3.0 Release Notes. This parameter is calculated based on the storage.block-cache.capacity parameter value. Can this parameter be used to limit TiKV’s memory usage?

  3. tikv_servers.resource_control.memory_limit parameter
    This parameter directly writes the memory limit into the systemd service file.

  4. storage.block-cache.capacity parameter
    Controls the size of the RocksDB Block Cache Size, indirectly limiting TiKV’s memory usage.

What is the priority order of these four parameters for limiting TiKV memory usage, from highest to lowest?

| username: ethercflow | Original post link

  1. The memory-usage-high-water parameter is hidden from users, defaulted to 0.9 as you mentioned, and does not support online updates;
  2. The memory-usage-limit parameter is used to limit the memory usage of various components within TiKV and calculates the size of raft messages being rejected based on the system’s total memory and reject_messages_on_memory_ratio. If TiKV’s actual memory usage exceeds memory_usage_high_water, TiKV will limit its own memory growth, for example, by rejecting raft messages based on the value calculated from reject_messages_on_memory_ratio. The memory usage here is counted in bytes rather than page size. Therefore, TiKV’s actual memory usage will be greater than the calculated value;
  3. Systemd limits memory through cgroup, so the memory reclamation rights of the managed process are determined by the operating system;
  4. The original design intention of points 1 and 2 is to let users focus on storage.block-cache.capacity (since both are calculated from this parameter);

Priority:
Theoretically, you only need to focus on point 4, for example:

  • system=8G, block-cache=3.6G, memory-usage-limit=6G, high-water=5.4G, page-cache=2G.
  • system=16G, block-cache=7.2G, memory-usage-limit=12G, high-water=10.8G, page-cache=4G
  • system=32G, block-cache=14.4G, memory-usage-limit=24G, high-water=21.6G, page-cache=8G

After setting point 4, if it does not match the proportions calculated in the examples above, you need to first determine:

  1. Whether it is caused by operating system parameters, such as whether THP is enabled and used, and whether the page size is greater than 4K. For example, the default page size for the aarch64 RHEL distribution is 64KB. As mentioned earlier, TiKV counts internally in bytes, so there may be a large deviation due to internal memory fragmentation caused by an excessively large page size;
  2. If system parameter factors are excluded and setting point 4 cannot effectively control TiKV memory usage, please use point 3 to control TiKV memory. When the limit is exceeded, the operating system will reclaim memory, and if necessary, trigger the cgroup-level oom-killer to reclaim memory. Additionally, please submit an issue and reproduction steps during your free time, and we will reproduce and improve point 4.

I hope this reply is helpful to you, and I look forward to your response. Thank you!

| username: alfred | Original post link

The summary is quite accurate :+1:

| username: OnTheRoad | Original post link

Thank you for your enthusiastic reply.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.