Abnormal TICDC Monitoring Metrics

translator_bot June 23, 2024, 3:56am 1

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TICDC 监控指标异常

| username: TiDBer_yyy

Database version: 5.0.4

CDC metrics:
curl -i http://cdcip:8300/metrics shows data, but there is none on the dashboard.

image1380×74 29.6 KB

Deployment method: Scale TiDB/TiKV/PD/TiCDC nodes
reload -R=prometheus,grafana dashboard still has no monitoring data.

Additionally, the monitoring dashboard shows a lot of NoDATA issues:

The phenomenon is very similar to this post: TiCDC监控看板没有监控数据 - TiDB 的问答社区, followed the post’s operations, but no recovery.

translator_bot June 23, 2024, 3:56am 2

| username: jansu-dev | Original post link

Did you create a changefeed? If not, many panels under the changefeed panel will be missing because no changefeed replication has been created.

translator_bot June 23, 2024, 3:56am 3

| username: TiDBer_yyy | Original post link

Created before, now there are 5 or 6 changefeeds.

translator_bot June 23, 2024, 3:56am 4

| username: jansu-dev | Original post link

Troubleshooting steps:

Since there are metrics, check if they are in Prometheus.
If not, check the Prometheus logs for clues, such as the port not being exposed or the configuration not being included in Prometheus at all.
If they are present, then check if the expression in Grafana is correct.

Follow these steps one by one, and you will find the answer.

translator_bot June 23, 2024, 3:56am 5

| username: TiDBer_yyy | Original post link

Prometheus query using PromQL returns no data.

Querying TiDB’s SQL has results:

image1380×536 49.2 KB
TiCDC has no results:

image1380×242 13.1 KB

Observed Prometheus logs are normal.
Grafana logs show an error; after restarting, the error log could not find datasource: data source not found appears, similar to the issue described in grafana 很多alert 都在告警，Execution Error: Could not find datasource Data source not found 但是没达到报警阈值 - TiDB 的问答社区.

translator_bot June 23, 2024, 3:56am 6

| username: jansu-dev | Original post link

Will reload restore according to the post?
It seems that prometheus didn’t go to ticdc to fetch data, persistence~

translator_bot June 23, 2024, 3:56am 7

| username: TiDBer_yyy | Original post link

Reload Prometheus and Grafana again to restore.

translator_bot June 23, 2024, 3:56am 8

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.