Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb集群的系统健康检查失败 集群中未启动必要组件 NgMonitoring
,部分功能将不可用
[TiDB Environment] Production Environment
[TiDB Version] v6.4.0
Tried restarting the Prometheus node but it didn’t solve the issue.
The image you provided is not visible. Please provide the text you need translated.
After restarting the TiDB cluster, this component cannot be started~~~
Check the logs, what error is described?
The image you provided is not visible. Please provide the text you need translated.
It looks like Prometheus initialization failed.
Based on the information you provided, the TiDB Dashboard shows that the cluster health check failed, indicating that the necessary component NgMonitoring
is not started. This could be due to one of the following reasons:
- The
NgMonitoring
component was not correctly deployed or started.
- The
NgMonitoring
component failed to start, possibly due to configuration errors or other issues.
To resolve this issue, you can follow these steps:
- Confirm whether the
NgMonitoring
component has been correctly deployed and started. You can check with the following command:
$ tiup cluster display <cluster-name>
If the NgMonitoring
component is not correctly deployed or started, you can try redeploying or starting the component. Specific steps can be found in Enable Continuous Profiling.
2. If the NgMonitoring
component has been correctly deployed and started but the issue persists, you can check whether the component’s configuration is correct. Specifically, you can check the configuration file of the NgMonitoring
component to ensure that its configuration matches that of other components.
3. If you still cannot resolve the issue, you can try checking the logs of the TiDB, TiKV, and PD components to see if there are any other errors or anomalies. You can use the following command to view the component logs:
$ tiup log <cluster-name> <component-name>
Where <cluster-name>
is the name of your TiDB cluster, and <component-name>
is the name of the component whose logs you want to view, such as tidb
, tikv
, or pd
.
Take a look at this, it is recommended to first try deleting the /tidb-data/prometheus-9090/docdb file, and then restart.