How to Monitor the Availability of TiDB Cluster Installation

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb的集群化安装如何监控可用性

| username: 喵父666

[Test Environment] Testing environment
[TiDB Version] 7.1.2

| username: tidb菜鸟一只 | Original post link

I don’t understand, TiDB installation itself already includes dashboard and Prometheus+Grafana combination for monitoring.

| username: Fly-bird | Original post link

  1. Server-level monitoring and alerts can be implemented by yourself.
  2. TiDB service monitoring and alerts can be set up in Grafana.
| username: swino | Original post link

You can consider the following aspects:

  1. Monitoring Node Health: For each node in the TiDB cluster, you can monitor its health status through visual monitoring platforms like TiDB Dashboard or built-in system monitoring tools, including metrics such as CPU, memory, disk, and network. If a node shows abnormalities or crashes, timely alerts and measures are needed to ensure the stable operation of the cluster.
  2. Monitoring Service Availability: For services within the TiDB cluster, heartbeat detection and other methods should be used to check their availability. For example, third-party tools or custom heartbeat detection scripts can be used to check the running status of components like TiDB, PD, and TiKV. Any service downtime or unavailability should trigger alerts and prompt action.
  3. Monitoring Cluster Throughput and Latency: Based on the business needs and performance requirements of the TiDB cluster, you can monitor the cluster’s throughput and latency to understand its performance and load conditions. Tools like TiDB Dashboard or TiDB’s monitoring components Prometheus and Grafana can be used to monitor and display throughput and latency data.
  4. Regular Testing and Drills: The availability of the TiDB cluster also requires drills and testing. Regular disaster recovery drills, such as shutting down a node or a service, should be conducted to verify the high availability and recovery capabilities of the TiDB cluster.

In summary, for a clustered TiDB installation, a combination of various monitoring tools and techniques should be used to comprehensively monitor and manage the nodes, services, and performance metrics of the TiDB cluster. Timely detection and handling of issues are crucial to ensure the high availability and data security of the TiDB cluster.

| username: zhanggame1 | Original post link

TiDB comes with both a dashboard and Prometheus+Grafana.

| username: Kongdom | Original post link

:thinking: Monitoring availability? I usually look at the Grafana overview page, the dashboard instance interface, or use tiup cluster display to see the cluster status.

| username: dba远航 | Original post link

Both the dashboard and Prometheus+Grafana can be used for monitoring.

| username: zxgaa | Original post link

The built-in monitoring is sufficient.

| username: 随缘天空 | Original post link

Periodically open the cluster’s dashboard monitoring to check the node status and cluster resource usage information (such as CPU, memory, hard disk, etc.). You can also set Prometheus alert rules.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.