Are there other methods for monitoring a TiDB cluster without relying on Grafana?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB集群不依赖grafana监控,是否还有其他的手段

| username: residentevil

[TiDB Usage Environment] Production Environment
[TiDB Version] V6.5.8
[Encountered Problem: Problem Phenomenon and Impact] TiDB’s Grafana monitoring is very comprehensive, which is beyond doubt. However, in scenarios where there are many TiDB clusters, are there other monitoring methods (preferably API interfaces) that can be unified into one monitoring platform? Currently, I see some information from the community documentation (TiDB 集群监控 API | PingCAP 文档中心), but it is still somewhat incomplete, especially for tidbserver [connection count, QPS, row changes per second, slow requests, etc.]. Has anyone done in-depth research on this?

| username: zhanggame1 | Original post link

Grafana retrieves data from Prometheus, and your monitoring can also capture data from here.

| username: redgame | Original post link

Through the TiDB monitoring API, it is complete.

| username: Fly-bird | Original post link

Currently, Grafana is considered a relatively mainstream monitoring and observation component. If you want to embed it into your own platform, you can call Grafana’s charts. Create the charts in Grafana and then embed them into your own platform. This way, it can also be achieved.

| username: residentevil | Original post link

Is it here?

| username: DBRE | Original post link

Yes, Prometheus uses the metrics interfaces of various TiDB components to obtain monitoring data.

| username: residentevil | Original post link

May I ask if Prometheus is a component installed by monitoring_servers?

| username: zhang_2023 | Original post link

There are interfaces, and mainstream monitoring systems can all be used.

| username: 连连看db | Original post link

TiDB’s own monitoring already covers a wide range, and specific needs can be met by customizing export collection.

| username: kelvin | Original post link

All the interfaces should be available, right?

| username: DBRE | Original post link

Yes.

| username: residentevil | Original post link

This monitoring_servers deployment might also be a single point of failure :sweat_smile:

| username: onlyacat | Original post link

Grafana is a monitoring dashboard, and the data is in Prometheus. As long as your monitoring platform supports PromSQL, it will work.

| username: DBRE | Original post link

It’s just a single point, Grafana and alertmanager are also single points.

| username: 小龙虾爱大龙虾 | Original post link

Your platform can directly connect to the cluster’s Prometheus or collect component information from each node by itself. The first method is recommended.

| username: residentevil | Original post link

I wonder if Prometheus also calls the PD interface to obtain these basic monitoring data.

| username: TiDBer_aaO4sU46 | Original post link

It’s easy to grab Prometheus, but complicated to use the API.

| username: residentevil | Original post link

We need to study which nodes Prometheus is collecting data from in TiDB, haha.

| username: 小龙虾爱大龙虾 | Original post link

Sure, check the addresses in the Prometheus configuration file, they should be under the metrics of each component’s status port. You can test it with TiDB, try curl http://<tidb_address>:10080/metrics, replacing <tidb_address> with your TiDB address. You can understand it as each component having already implemented an exporter.

| username: residentevil | Original post link

I just tested it, and the metrics interface of TIDBServer is OK. I’ll check if there is an interface on the TIKV component as well. Thank you very much.