TiKV memory exceeds the block-cache setting value

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv内存超过block-cache设置的值

| username: TiDBer_xt

【TiDB Usage Environment】Production Environment
【TiDB Version】5.4
【Encountered Problem】TiKV memory usage remains high, are there any other parameters?
【Reproduction Path】None
【Problem Phenomenon and Impact】
TiKV memory is not being released, causing insufficient cluster resources
【Attachments】
image

| username: ddhe9527 | Original post link

TiKV memory usage generally consists of storage.block-cache.capacity plus 2.5GB, which is the memory size occupied by the memtable.

| username: TiDBer_xt | Original post link

From the monitoring, it seems that this parameter has already taken effect, but the memory usage keeps increasing.


This statement seems to have an issue. When I didn’t configure this parameter, it was said that the default is 45% of the memory, but in reality, it will consume the cluster resources until they are fully used.

| username: ddhe9527 | Original post link

Is there only one TiKV instance on the server? From the monitoring, what was the previous Block cache size value? If you are worried about the TiKV process excessively using memory, you can also use the memory_limit in resource_control to limit its total memory usage.

| username: TiDBer_xt | Original post link

Previously, I modified storage.block-cache.capacity to 5G. Now, according to the monitoring, it is indeed 5G, but the resources used by the pod have already reached 7G. Multiple nodes, one node per TiKV.

| username: ddhe9527 | Original post link

I’m not very familiar with containers. I suggest explicitly setting the block-cache and then observing.

| username: TiDBer_xt | Original post link

I have already set it up, but there is no effect after the setup. I’m asking what’s going on.

| username: ddhe9527 | Original post link

After setting the block-cache to 5GB, the total memory usage should reach around 7.5GB at its peak, and it might be slightly higher sometimes, as the coprocessor occasionally needs to cache data.

| username: xiaohetao | Original post link

| username: TiDBer_xt | Original post link

Are you saying that the value of memory_limit plus the value of block-cache is the total memory used by TiKV?

| username: ddhe9527 | Original post link

What you mentioned is not the same memory_limit. What you posted is the configuration of the raft engine. Please refer to a question I mentioned earlier:

| username: xiaohetao | Original post link

:joy:

| username: TiDBer_xt | Original post link

The official documentation says that you only need to set the shared parameter “storage,” but now that it is set, it doesn’t work. The total memory exceeds the set value. I just want to know why it exceeds?

| username: xiaohetao | Original post link

Currently, the memory usage of your TiKV nodes is high. I think you can check the related memory configuration parameters.

You can refer to: TiKV 内存参数性能调优 | PingCAP 文档中心

| username: TiDBer_xt | Original post link

I followed the instructions in the link you provided, did not execute any SQL statements, only performed insert operations.

| username: Lystorm | Original post link

Building on this question, does this require reserving some memory as the system’s page cache? Is this autonomously set by TiDB based on the system’s situation, or is there a default configuration in the configuration file? If it’s in the configuration file, can it be modified?

| username: xiaohetao | Original post link

This should mean that the total memory configuration of TiKV is less than the total system memory, and the remaining part is for the operating system to use.

| username: Lystorm | Original post link

If in the scenario of a K8S cluster, I limit the memory resources of the POD where TIKV is located, for example, to 10G, what will happen to the TIKV service when the memory usage of this POD reaches 10G? How will the subsequent data be handled?

| username: xiaohetao | Original post link

I haven’t used k8s, so I’m not very familiar with it.

| username: xiaohetao | Original post link

You can observe for a period of time. Is your current workload high?
Is it possible that the release process takes time, and it might be slower when the workload is high?