Replacing Monitoring Nodes Without Collecting Metrics

translator_bot · June 23, 2024, 12:04am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 替换监控节点不采集监控

| username: 18515065291

[TiDB Usage Environment] Production Environment
[TiDB Version]: 5.2.2, 5.2.3
[Encountered Issue]: Replacing Monitoring Machine
<1> Take the existing monitoring nodes offline, including Prometheus, Grafana, and Alertmanager.
<2> Then scale out to the same physical machine with the following configuration:

monitoring_servers:
- host: 10.1.1.1
  port: 666
  deploy_dir: /work/tidb666/prometheus-777
  data_dir: /work/tidb666/prometheus-777/data
  log_dir: /work/tidb666/prometheus-777/log
  storage_retention: 5d

grafana_servers:
- host: 10.1.1.1
  port: 888
  deploy_dir: /work/tidb666/grafana-888

alertmanager_servers:
- host: 10.1.1.1
  web_port: 999
  cluster_port: 555
  deploy_dir: /work/tidb666/alertmanager-555
  data_dir: /work/tidb666/alertmanager-555/data
  log_dir: /work/tidb666/alertmanager-555/log

<3> After scaling out, it was found that the Prometheus monitoring item probe_success could not collect the three monitoring items of 10.1.1.1: node_exporter, black_export, and grafana.
<4> Enabling the debug log level of black_exporter revealed that it does not collect the IP 10.1.1.1.

translator_bot · June 23, 2024, 12:04am

| username: Min_Chen | Original post link

Hello,

Check if these three monitoring items are configured in the Prometheus configuration file?

translator_bot · June 23, 2024, 12:04am

| username: srstack | Original post link

Try running tiup cluster reload -R prometheus.