Alertmanager failed to start

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: alertmanager启动失败

| username: 孤独的狼

【TiDB Usage Environment】Production Environment or Test Environment or POC
【TiDB Version】
【Encountered Problem】
【Reproduction Path】What operations were performed that caused the problem
【Problem Phenomenon and Impact】

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

[tidb@tidb-30-116 log]$ tiup cluster display tidb-test
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.8.2/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v4.0.9
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://172.17.30.118:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


172.17.30.116:9093 alertmanager 172.17.30.116 9093/9094 linux/x86_64 Down /tidb-data/alertmanager-9093 /tidb-deploy/alertmanager-9093
172.17.30.116:3000 grafana 172.17.30.116 3000 linux/x86_64 Down - /tidb-deploy/grafana-3000
172.17.30.117:2379 pd 172.17.30.117 2379/2380 linux/x86_64 Up /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.17.30.118:2379 pd 172.17.30.118 2379/2380 linux/x86_64 Up|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.17.30.119:2379 pd 172.17.30.119 2379/2380 linux/x86_64 Up|L /tidb-data/pd-2379 /tidb-deploy/pd-2379
172.17.30.116:9090 prometheus 172.17.30.116 9090 linux/x86_64 Down /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
172.17.30.117:4000 tidb 172.17.30.117 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
172.17.30.118:4000 tidb 172.17.30.118 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
172.17.30.119:4000 tidb 172.17.30.119 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
172.17.30.117:20160 tikv 172.17.30.117 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
172.17.30.118:20160 tikv 172.17.30.118 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
172.17.30.119:20160 tikv 172.17.30.119 20160/20180 Pending Offline /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
Total nodes: 12prometheus.log (1.2 MB) grafana.log.2022-07-13.001 (33.5 MB) alertmanager.log (2.3 KB)
[tidb@tidb-30-116 log]$

The machine normally shut down the cluster, but after restarting the host, an error occurred when starting the cluster monitoring.
The specific logs are in the attached logs.

| username: songxuecheng | Original post link

Delete the files in the corresponding directory and restart.

| username: db_user | Original post link

What is the background? Have you performed any scaling down operations? Check the TiKV logs, currently one TiKV node is not in the correct state.

| username: 孤独的狼 | Original post link

Yes, one of the kv instances didn’t start up. Is this related? Does it affect the startup of monitoring?

| username: 孤独的狼 | Original post link

If starting, should it be manually started individually or use cluster startup to drive monitoring startup?

| username: songxuecheng | Original post link

You can start it manually.

| username: 孤独的狼 | Original post link

How to start manually, can you provide a general command template?

| username: songxuecheng | Original post link

tiup cluster restart tidb-cluster -R prometheus
tiup cluster restart tidb-cluster -R grafana
tiup cluster restart tidb-cluster -R alertmanager

| username: ShawnYan | Original post link

Check out this post to see if there are any similar issues: [tidb-v5]安装完TiDB集群,alertmanager 无法启动 - TiDB 的问答社区

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. No new replies are allowed.