Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tikv 内存一直增长 最终oom
[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Memory parameters have been adjusted to 16G, default was 28G, TiKV node system memory is 64G.
Observing the tikv-cdc cdc memory monitoring, it keeps increasing. How can we control it?
Reloading the ticdc cluster does not release the memory occupied by tikv cdc.
Why is the cdc memory usage so high? Doesn’t this affect the stability of the cluster with all the synchronization tasks?
The issue appeared after upgrading from version 5.2.1 to 6.5.0, suspecting it might be related to the version.
Synchronizing data, CDC memory naturally keeps growing.
Is this reasonable? Not recycling, not releasing, not controlling the size of the occupation? It occupies 7G at startup and keeps growing to 40G, causing tikv-server to OOM. This essentially means that the synchronization software is affecting the stability of the cluster.
It may be caused by a large wide table in the synchronization task. You can limit the memory usage of a single table through per-table-memory-quota. For details, please refer to TiCDC 简介 | PingCAP 文档中心
Setting it to 10MB and restarting CDC doesn’t work. The CDC cluster has 128GB of memory, and each node only uses around 3GB of memory. However, the CDC memory on the TiKV server nodes occupies a large amount of memory. According to the documentation, the main memory usage on the TiKV server is from storage.block-cache-size.capacity (28GB) and write-buffer-size (default 128MB). The TiKV server has a total of 64GB of memory and is not deployed in a mixed environment. The memory-usage-limit calculated by TiDB itself is 48GB. Even if I adjust storage.block-cache-size.capacity to 16GB, the total memory of the TiKV server will still rise to 48GB and eventually OOM. The memory usage of the CDC cluster is very low, mainly the TiKV server is high, and I have no idea why.
The memory load on the CDC node is very low, with no pressure at all.
However, the memory on the TiKV node keeps increasing.
If possible, could you please share the TiKV-Details monitoring and TiCDC monitoring? Is there any delay in CDC synchronization now?
There is no delay in CDC. I checked the TiKV logs, and previously the tikv.log files were very small. Since the upgrade, the tikv.log files have become very large, containing some CDC-related errors.
The current tikv.log is still reporting this error.
Could the command used to create the TiCDC synchronization be related to the parameters?
tiup ctl:v6.5.0 cdc changefeed create --pd=http://10.30.30.4:2379 --sink-uri="kafka://node2.prod.com:9092/ticdc?kafka-version=2.13.3&partition-num=3&max-message-bytes=128108864&replication-factor=3&protocol=canal-json&compression.type=lz4" --changefeed-id="ticdc-prd"
After reloading TiKV, the CDC memory goes back to 6-7G but will keep increasing to 34G until it hits OOM (Out of Memory) for TiKV.
The TiKV logs are still reporting errors.
Please provide information about the cluster configuration and deployment, don’t make everyone guess 
The TiKV configuration is all default (many parameters have been dynamically adjusted, but it didn’t help). Only a few parameters were set for the TiDB node.
2 TiDB, 5 TiKV, 3 PD, 2 TiCDC cluster, extracting data to Kafka. Currently, the memory of the TiKV-server node keeps increasing, specifically the CDC memory. The TiCDC synchronization checkpoint is progressing normally without delay.
Use https://metricstool.pingcap.net/ to export the monitoring data of tikv-details and cdc as a JSON file and take a look.
The memory here does not refer to the memory of CDC. You can check if the resolved-ts.enable parameter is set to true, and try setting the parameter to false to see if there is any improvement?
Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
Let’s take a look at your configuration.
Couldn’t find tivk. There is a pprof debug interface address.
Yes, it is the CDC memory. The CDC cluster itself uses very little memory. This is the TiKV server metric CDC memory indicator. This parameter cannot be dynamically set. Let’s wait for it to increase. I will try reloading the configuration file to see.
The image is not visible. Please provide the text you need translated.
Where is ticdc installed?