TiCDC Alert Metrics

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiCDC 告警指标

| username: TiDBer_yyy

[TiDB Usage Environment] Production
[TiDB Version] 5.0.4
[Encountered Issues]

  1. Why is there a significant delay?

  2. How to resolve these alerts?

Alert metrics show many alerts:
Alert metric cdc_checkpoint_high_delay metrics:
(time() - ticdc_processor_checkpoint_ts /1000) >600
Alert query results:

Check changefeed checkpoint_ts status:
image

Grafana delay status:

| username: Meditator | Original post link

Click on the Grafana metrics to see how the “changefeed checkpoint” metric is calculated?

| username: TiDBer_yyy | Original post link

  1. The two metrics are different and cannot be compared together.
changefeed checkpoint lag --dashboard
max(ticdc_owner_checkpoint_ts_lag{tidb_cluster="$tidb_cluster", changefeed=~"$changefeed"}) by (changefeed)

cdc_checkpoint_high_delay alert metric:
(time() - ticdc_processor_checkpoint_ts /1000) >600
  1. How can I eliminate the cdc_checkpoint_high_delay alert delay?
| username: yilong | Original post link

You can first refer to the documentation for troubleshooting:

| username: TiDBer_yyy | Original post link