TiKV Monitoring Issues

translator_bot · June 21, 2024, 7:21am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv监控问题

| username: 月明星稀

Dear experts in the forum, I have a question. TiKV manages multiple regions independently. How do you monitor the availability of regions?

translator_bot · June 21, 2024, 7:21am

| username: ShawnYan | Original post link

Check this out:

translator_bot · June 21, 2024, 7:21am

| username: 像风一样的男子 | Original post link

TiDB’s monitoring system will automatically monitor anomalies, and you can see alerts on Alertmanager. You can output the alert information to a corporate WeChat group like I did.

translator_bot · June 21, 2024, 7:21am

| username: dba远航 | Original post link

You can monitor it through Grafana.

translator_bot · June 21, 2024, 7:21am

| username: 烂番薯0 | Original post link

In Grafana, there is an overview, and there is a region health.

translator_bot · June 21, 2024, 7:21am

| username: 月明星稀 | Original post link

Does TiKV have an interface to obtain region health information?

translator_bot · June 21, 2024, 7:21am

| username: 像风一样的男子 | Original post link

The data is all collected by Prometheus. You can research Grafana yourself to understand it.

translator_bot · June 21, 2024, 7:21am

| username: 月明星稀 | Original post link

This has already failed to read, and then it will report an error, right?

translator_bot · June 21, 2024, 7:21am

| username: 有猫万事足 | Original post link

From the IP where Prometheus is located, visit http://{prometheus_ip}:9090/targets.
You can see the address for accessing PD metrics.

From this address, go in and find the metrics configured in Grafana.
As shown in the figure below.
0f5f0fede97403a8b78a62374c2e664a

This is what you need.

translator_bot · June 21, 2024, 7:21am

| username: kelvin | Original post link

There is Grafana for monitoring.

translator_bot · June 21, 2024, 7:21am

| username: 月明星稀 | Original post link

May I ask where else can this data be collected if Prometheus is not enabled?

translator_bot · June 21, 2024, 7:21am

| username: Jellybean | Original post link

If you want to obtain the health information of a region, you can also check the cluster metadata using pd-ctl.

translator_bot · June 21, 2024, 7:21am

| username: 有猫万事足 | Original post link

Prometheus only collects metrics from this address. Without Prometheus, it is not convenient to find the addresses to access these metrics.

The interfaces providing these metrics have always existed and are written in the corresponding component code. They are continuously being collected even if you don’t check them.

The PD interface is http://{pd_ip}:2379/metrics

translator_bot · June 21, 2024, 7:21am

| username: 这里介绍不了我 | Original post link

Regularly call the Prometheus interface, and based on the return value, decide whether to trigger an alert.