Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tikv监控问题
Dear experts in the forum, I have a question. TiKV manages multiple regions independently. How do you monitor the availability of regions?
TiDB’s monitoring system will automatically monitor anomalies, and you can see alerts on Alertmanager. You can output the alert information to a corporate WeChat group like I did.
You can monitor it through Grafana.
In Grafana, there is an overview, and there is a region health.
Does TiKV have an interface to obtain region health information?
The data is all collected by Prometheus. You can research Grafana yourself to understand it.
This has already failed to read, and then it will report an error, right?
From the IP where Prometheus is located, visit http://{prometheus_ip}:9090/targets.
You can see the address for accessing PD metrics.
From this address, go in and find the metrics configured in Grafana.
As shown in the figure below.

This is what you need.
There is Grafana for monitoring.
May I ask where else can this data be collected if Prometheus is not enabled?
If you want to obtain the health information of a region, you can also check the cluster metadata using pd-ctl.
Prometheus only collects metrics from this address. Without Prometheus, it is not convenient to find the addresses to access these metrics.
The interfaces providing these metrics have always existed and are written in the corresponding component code. They are continuously being collected even if you don’t check them.
The PD interface is http://{pd_ip}:2379/metrics
Regularly call the Prometheus interface, and based on the return value, decide whether to trigger an alert.