How to Analyze if Slow Push to Kafka Occurs When No Large Transactions are Found on the Source End in TiCDC

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 源端没有发现大事务,ticdc如何分析是否是推送到kafka慢

| username: wluckdog

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
Alert Details: TiCDC heap memory usage is over 10 GB
[Resource Configuration]
Configured 256G of host memory for cdc, actual usage is 10G, but it still uses host disk space. How to adjust to use more memory, and the cdc task has delays. No large transactions found on the source side, how to analyze if it is slow to push to Kafka.
total used free shared buff/cache available
Mem: 251 9 232 0 9 241
Swap: 31 0 31
[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

You can consider integrating TiCDC monitoring first to gather more metrics, which will be convenient for reference.

You can refer to the official handling method.
image

| username: wluckdog | Original post link

Mainly want to know where the bottleneck of CDC delay is. Based on the 24-hour transaction execution time, there should be no large transactions.

| username: xfworld | Original post link

Common troubleshooting steps for latency issues:

  1. Upstream
    a. Large transaction commits causing delayed processing
    b. Resource bottlenecks leading to slow event processing
    c. Network congestion

  2. CDC itself
    Refer to the monitoring metrics which are easily accessible, no need to describe further.

  3. Downstream
    a. Reception delay, downstream congestion causing backlog and untimely processing
    b. Network congestion
    c. Insufficient resources leading to slow processing

You will need to investigate each item specifically, which can be quite troublesome. This is for your reference.

| username: wluckdog | Original post link

In the case of deploying TiCDC separately, is the usage of TiCDC heap memory on the TiKV host or the CDC host? The principle is still somewhat unclear.

| username: Min_Chen | Original post link

Hello, please provide the monitoring data. Export it using Clinic, referring to the method in Using PingCAP Clinic to Diagnose the Cluster.

| username: wluckdog | Original post link

There are too many files being collected, the files are too large, and it’s too slow, causing the collection to be interrupted each time. Is there any other diagnostic method?

| username: Min_Chen | Original post link

If the method from tclinic is not convenient to operate, you can export the monitoring data using the method described at PingCAP MetricsTool.

| username: wluckdog | Original post link

calctidb-cluster-TiCDC_2022-11-18T07_33_28.835Z.rar (1.8 MB)