How to Configure Automatic Restart Alerts for TiDB and TiKV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb,tikv自动重启报警如何配置

| username: 路在何chu

[TiDB Usage Environment] Production Environment
[TiDB Version]
4.0.13
[Reproduction Path] What operations were performed when the issue occurred

[Encountered Issue: Issue Phenomenon and Impact]


There are only down alerts, no restart alerts. The most recent tikv restart did not trigger an alert. I want to add some monitoring, but I couldn’t find any relevant information despite searching a lot. Has anyone configured this before?

| username: WalterWj | Original post link

Just monitor the uptime.

| username: 像风一样的男子 | Original post link

Take a look at the configuration file tidb-deploy/prometheus-8249/conf/tikv.rules.yml, there is a restart alert rule:

  • alert: TiKV_node_restart
| username: 路在何chu | Original post link

No, should this be added, (time() - process_start_time_seconds{tidb_cluster=“”, job=“tikv”})?

| username: 路在何chu | Original post link

Set this to alert when it is less than a certain value.

| username: 像风一样的男子 | Original post link

Here is my configuration file for your reference:
tikv.rules.yml (14.3 KB)

| username: 路在何chu | Original post link

Okay, this is something you added later, it shouldn’t have been in the original.

| username: 像风一样的男子 | Original post link

It is not added later; it comes with it by default.

| username: 路在何chu | Original post link

Is my version too low?

| username: 路在何chu | Original post link

It is indeed due to the version. I also have it in my 6.5.5 version, but I am not sure if it can be recognized after adding it.

| username: TiDBer_小阿飞 | Original post link

Your version is too low, right? Not sure if it will work, you can test it out.

| username: 像风一样的男子 | Original post link

The process_start_time_seconds was added in version 4.0.2 and should take effect.

| username: 路在何chu | Original post link

Tested, it indeed works, thank you.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.