Tidb alerte backup, ram utilization and query in prometheus

I am trying to create a TiDB alert with Prometheus. I want to monitor 3 metrics.

  1. The date of the last backup in my tikv

I have this

max(tikv_backup_range_duration_seconds_count{type=“save”})

it gives the duration I want the date not duration

  1. The query that make more than 1 sec

I have this

histogram_quantile(0.99, sum by(le, instance)(rate(tidb_server_handle_query_duration_seconds_bucket[1m]))) > 1

that give me instance I want the query

  1. The ram utilization , tidb, tikv and PD

I have this
sum by(instance) (tikv_store_size_bytes{type=“available”}) / sum by(instance) (tikv_store_size_bytes{type=“capacity”}) *100 < 20

I want the ram not a disk size .

TiDB version: 6.0.1

For the first metric, you can use the tikv_backup_range_duration_seconds_bucket metric to get the duration of the last backup. However, this metric only provides the duration, not the date. To get the date, you can use the tikv_backup_range_create_time_seconds metric, which provides the Unix timestamp of when the backup was created. You can use the max_over_time function to get the latest timestamp and then convert it to a human-readable date using the timestamp function. Here’s an example query:

timestamp(max_over_time(tikv_backup_range_create_time_seconds{type="save"}[1h]))

For the second metric, you can use the tidb_server_slow_query metric to get the slow queries. This metric provides a sql label that contains the slow query. You can use the histogram_quantile function to get the 99th percentile of the query duration and then filter the slow queries based on that. Here’s an example query:

tidb_server_slow_query{quantile="0.99"} > 1

For the third metric, you can use the node_memory_MemTotal_bytes and node_memory_MemFree_bytes metrics to get the total and free memory of each node. You can then use the sum by function to aggregate the metrics by instance and calculate the memory utilization. Here’s an example query:

sum by(instance) ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes) / node_memory_MemTotal_bytes) * 100 > 80

Note that this query calculates the memory utilization for each node, not just TiDB, TiKV, and PD. If you want to calculate the memory utilization for each component separately, you can use the node_memory_MemTotal_bytes and node_memory_MemFree_bytes metrics along with the appropriate labels to filter the nodes.