TiKV Memory Usage Keeps Increasing

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiKV 内存一直增长

| username: TiDBer_vFs1A6CZ

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.1
[Reproduction Path]
1> Disable transparent huge pages,
2> Set storage.block-cache.capacity = 48G,
3> memory-usage-limit = 82G
[Encountered Problem: Phenomenon and Impact]
TiKV memory is limited to 82G, but observed cluster memory has already grown beyond 90G.

Why is the memory limit condition not taking effect?
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

[Attachments: Screenshots/Logs/Monitoring]

| username: Fly-bird | Original post link

Did you reload?

| username: TiDBer_小阿飞 | Original post link

Is it at the GLOBAL level? tidb_server_memory_limit
Additionally, after setting this variable, when the memory usage of the tidb-server instance reaches 32 GB, TiDB will sequentially terminate the SQL operations with the highest memory usage until the memory usage of the tidb-server instance drops below 32 GB. The forcibly terminated SQL operations will return an error message Out Of Memory Quota! to the client.

| username: tidb菜鸟一只 | Original post link

Normally, you don’t need to set memory-usage-limit; you only need to set storage.block-cache.capacity. The maximum memory usage of TiKV will be limited to 5/3 * storage.block-cache.capacity. Execute SHOW config WHERE NAME LIKE ‘%storage.block-cache.capacity%’; to check if the storage.block-cache.capacity settings have taken effect.

| username: TiDBer_vFs1A6CZ | Original post link

All have taken effect, 49G.

| username: TiDBer_vFs1A6CZ | Original post link

The unlimited memory growth is related to TiKV, not TiDB.

| username: heiwandou | Original post link

Configure the limit parameters.

| username: tidb菜鸟一只 | Original post link

Are you looking at the total memory of the TiKV node? How much memory is the TiKV process occupying on the server?

| username: 随缘天空 | Original post link

Check the Prometheus monitoring for TiKV process memory usage. If the memory usage keeps increasing, there might be a memory leak issue. Additionally, you can check the TiKV log files to see if there are any warning or error logs indicating the presence of a memory leak.

| username: 有猫万事足 | Original post link

The memory of TiKV is mainly controlled by the parameter storage.block-cache.capacity, which determines the size of the entire block-cache. This parameter affects RocksDB reads.

Then there is write-buffer-size, and there might be several write-buffers. This parameter affects RocksDB writes.

You can check the number and size settings through the config:
show config where name like '%write%buffer%' and type='tikv';

These two sets of parameters are actually set on RocksDB. Besides RocksDB, TiKV also has a Rust shell, which also occupies a certain amount of memory.

To put it simply, the Rust part’s memory cannot be controlled. In RocksDB, if you blindly reduce the block-cache, you can definitely achieve the effect of controlling memory. For example, in my 4c8g setup, the original block-cache was 3.5g. Reducing it to 2g can prevent TiKV memory alarms. Otherwise, memory usage above 80% would trigger constant alarms.

So, if you want to ensure memory usage is below 80%, you might as well set the block-cache to below 20g, and then gradually adjust it to your ideal level as the memory usage decreases.

| username: 逍遥_猫 | Original post link

Could you please advise what indications of memory leaks might appear in the TIKV logs?

| username: zxgaa | Original post link

Study and learn.

| username: 随缘天空 | Original post link

You can check the Error level error information in the log file to see if there are any OOM-related errors, or use the log module in the dashboard panel to search online for all error level information on the TiKV nodes.

| username: andone | Original post link

Under memory constraints

| username: 逍遥_猫 | Original post link

Under the condition of limiting tikv storage.block-cache.capacity, what could cause memory leaks?

| username: andone | Original post link

Study and learn

| username: swino | Original post link

Study and learn.

| username: Connor1996 | Original post link

The resolved-ts module has a known issue causing continuous memory growth, see Resolver memory is not reclaimed and may cause OOM · Issue #15458 · tikv/tikv · GitHub

If you don’t use stale read, you can disable the resolved-ts module

enable = false