How to Release CDC Memory When TiKV CDC Memory Continues to Grow Without CDC Tasks

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv cdc memory持续增长,无cdc任务,如何释放cdc memory

| username: Hacker_gM2NmiLh

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.1
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Phenomenon and Impact]

  1. The memory usage of the TiKV node continues to increase. Referring to the documentation (TiKV主要内存结构和OOM排查总结 - TiDB 社区技术月刊 | TiDB Books), it was found that the CDC memory of TiKV occupied 50GB of memory and continued to grow.
    Memory-related configuration:
    Server memory: 256.9 GiB
    storage.block-cache.capacity: 180GiB
    cdc.sink-memory-quota: 512MB
  2. Currently, there are no CDC tasks in the cluster, so the CDC nodes were scaled down, but the CDC memory part has not been released and continues to grow.

Questions:

  1. After scaling down the CDC nodes, the memory continues to grow. How can the memory be released?
  2. Why does the memory continue to grow even though the CDC component is installed but no TiCDC tasks are created?
  3. The configuration cdc.sink-memory-quota: 512MB does not seem to take effect.

[Resource Configuration]

Server memory:
image

CDC memory trend:
image
image

| username: WalterWj | Original post link

Upgrade, CDC had issues with excessive memory usage in the early stages.

| username: 江湖故人 | Original post link

It might be caused by TiKV generating change logs.

| username: Daniel-W | Original post link

Upgrading to the latest CDC is relatively stable.

| username: Hacker_gM2NmiLh | Original post link

Is there a way to release it in the short term? Is restarting the only option?

| username: WalterWj | Original post link

Restarting is one solution.

| username: Hacker_gM2NmiLh | Original post link

Additionally, I found that the expression for the CDC memory metric is:
It looks like the memory-block cache of TiKV, which may not be very reasonable and could interfere with troubleshooting.
(avg(process_resident_memory_bytes{k8s_cluster=“$k8s_cluster”, tidb_cluster=“$tidb_cluster”, instance=~“$tikv_instance”, job=~“tikv.*”}) by (instance)) - (avg(tikv_engine_block_cache_size_bytes{k8s_cluster=“$k8s_cluster”, tidb_cluster=“$tidb_cluster”, instance=~“$tikv_instance”, db=“kv”}) by(instance))

| username: WalterWj | Original post link

I think the new version of the formula is pretty OK. :thinking:

| username: YuchongXU | Original post link

Restart or upgrade

| username: chris-zhang | Original post link

Upgrading is relatively stable.

| username: zhang_2023 | Original post link

Upgrade to the latest version, the old version has bugs.

| username: 小于同学 | Original post link

Upgrade the version.

| username: TiDBer_rvITcue9 | Original post link

Upgrade it.

| username: 路在何chu | Original post link

I have been restarting the CDC nodes in rotation to free up memory. Is there any better solution?

| username: Soysauce520 | Original post link

It looks like you’ve encountered the bug in version 6.1. Upgrading to 6.5 should resolve the issue.

| username: Hacker_gM2NmiLh | Original post link

After the upgrade, it looks like nothing has changed, v 7.5.1, and there are two tikv-{{instance}} metrics :sweat_smile:

| username: zhanggame1 | Original post link

That’s too high. The memory consumption of my setup is less than 1GB.

| username: GreenGuan | Original post link

Restart or upgrade

| username: Hacker_gM2NmiLh | Original post link

The block cache in the production environment is configured with 180GB.

| username: Hacker_gM2NmiLh | Original post link

The upgrade still hasn’t resolved the issue.