Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 百亿表CDC同步致TiKV节点内存一直涨问题

Background
A certain TiDB cluster with high TPS has two tables (one with 22.6 billion rows and the other with 2.1 billion rows) being synchronized by CDC (on the 6th night, archiving the 22.6 billion table caused the TiKV machine’s memory to increase by 5G). The TiKV machine’s memory usage has been steadily increasing over the past 30 days.
Memory usage of a certain TiKV storage node machine
Analysis
Version Information
TiDB Cluster: Cluster version: v5.1.4
CDC: Release Version: v5.1.4
TiKV machine configuration: 3.7T NVME disk
Parameter Configuration
- Each TiKV storage node’s block-cache is set to 12G (64G memory machine)
- Storage node machines are independently deployed with TiKV, and TiKV nodes themselves have a memory usage rate of 76%
- Transparent huge pages on TiKV storage node machines have always been disabled
cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
Characteristics of TiDB CDC Synchronized Tables
-
Stream type with high TPS writes in a short time, more writes and fewer reads (peak period inserts 40,000 to 50,000 data per second)
-
Batch inserts of 100 to 200 rows each time
-
On the 6th, CDC wrote to the downstream data change rows up to about 42,000, and on the 7th, after migrating CDC to a high-memory machine, it wrote to the downstream data change rows up to 78,000
Main Memory Components of TiKV Process
Mainly block_cache, but currently, TiKV block_cache is 12G, far from reaching the 50G memory usage of the TiKV process.
Grafana monitoring of CDC-TiKV found that other TiKV nodes’ CDC-related components consume a large portion of memory:
- process_resident_memory_bytes-tikv_engine_block_cache_size_bytes (each TiKV node consumes about 34G of memory), a large portion of the 50G memory consumption of the TiKV process is non-block_cache consumption, i.e., TiKV CDC components
Non-block_cache memory consumption of TiKV nodes
- tikv_cdc_sink_memory_bytes (very small memory)
Size of the old value cache
- tikv_cdc_old_value_cache_bytes (very small memory)
Size of the CDC change event cache waiting to be sent in TiKV
Speculation
The issue of TiKV machine memory continuously increasing is speculated to be due to the large memory consumption by the TiKV CDC components. The question is why the memory consumption of the TiKV CDC components keeps increasing and whether there is an online method to release the memory of the TiKV CDC components.
专栏 - TiCDC 架构和数据同步链路解析 | TiDB 社区 TiCDC Architecture and Data Synchronization Link Analysis
专栏 - TiKV主要内存结构和OOM排查总结 | TiDB 社区 Summary of TiKV Main Memory Structure and OOM Troubleshooting
TiCDC 简介 | PingCAP 文档中心 TiCDC Overview