TiKV Monitoring Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv监控问题

| username: 月明星稀

Dear experts in the forum, I have a question. TiKV manages multiple regions independently. How do you monitor the availability of regions?

| username: ShawnYan | Original post link

Check this out:

| username: 像风一样的男子 | Original post link

TiDB’s monitoring system will automatically monitor anomalies, and you can see alerts on Alertmanager. You can output the alert information to a corporate WeChat group like I did.

| username: dba远航 | Original post link

You can monitor it through Grafana.

| username: 烂番薯0 | Original post link

In Grafana, there is an overview, and there is a region health.

| username: 月明星稀 | Original post link

Does TiKV have an interface to obtain region health information?

| username: 像风一样的男子 | Original post link

The data is all collected by Prometheus. You can research Grafana yourself to understand it.

| username: 月明星稀 | Original post link

This has already failed to read, and then it will report an error, right?

| username: 有猫万事足 | Original post link

From the IP where Prometheus is located, visit http://{prometheus_ip}:9090/targets.
You can see the address for accessing PD metrics.

From this address, go in and find the metrics configured in Grafana.
As shown in the figure below.
0f5f0fede97403a8b78a62374c2e664a

This is what you need.

| username: kelvin | Original post link

There is Grafana for monitoring.

| username: 月明星稀 | Original post link

May I ask where else can this data be collected if Prometheus is not enabled?

| username: Jellybean | Original post link

If you want to obtain the health information of a region, you can also check the cluster metadata using pd-ctl.

| username: 有猫万事足 | Original post link

Prometheus only collects metrics from this address. Without Prometheus, it is not convenient to find the addresses to access these metrics.

The interfaces providing these metrics have always existed and are written in the corresponding component code. They are continuously being collected even if you don’t check them.

The PD interface is http://{pd_ip}:2379/metrics

| username: 这里介绍不了我 | Original post link

Regularly call the Prometheus interface, and based on the return value, decide whether to trigger an alert.