System health check of TiDB cluster failed: essential component `NgMonitoring` not started in the cluster, some features will be unavailable

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb集群的系统健康检查失败 集群中未启动必要组件 NgMonitoring,部分功能将不可用

| username: TiDBer_vJpITQ5J

[TiDB Environment] Production Environment
[TiDB Version] v6.4.0
Tried restarting the Prometheus node but it didn’t solve the issue.

| username: TiDBer_vJpITQ5J | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: TiDBer_vJpITQ5J | Original post link

After restarting the TiDB cluster, this component cannot be started~~~

| username: xfworld | Original post link

Check the logs, what error is described?

| username: TiDBer_vJpITQ5J | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: xfworld | Original post link

It looks like Prometheus initialization failed.

| username: Billmay表妹 | Original post link

Based on the information you provided, the TiDB Dashboard shows that the cluster health check failed, indicating that the necessary component NgMonitoring is not started. This could be due to one of the following reasons:

  1. The NgMonitoring component was not correctly deployed or started.
  2. The NgMonitoring component failed to start, possibly due to configuration errors or other issues.

To resolve this issue, you can follow these steps:

  1. Confirm whether the NgMonitoring component has been correctly deployed and started. You can check with the following command:
$ tiup cluster display <cluster-name>

If the NgMonitoring component is not correctly deployed or started, you can try redeploying or starting the component. Specific steps can be found in Enable Continuous Profiling.
2. If the NgMonitoring component has been correctly deployed and started but the issue persists, you can check whether the component’s configuration is correct. Specifically, you can check the configuration file of the NgMonitoring component to ensure that its configuration matches that of other components.
3. If you still cannot resolve the issue, you can try checking the logs of the TiDB, TiKV, and PD components to see if there are any other errors or anomalies. You can use the following command to view the component logs:

$ tiup log <cluster-name> <component-name>

Where <cluster-name> is the name of your TiDB cluster, and <component-name> is the name of the component whose logs you want to view, such as tidb, tikv, or pd.

| username: caiyfc | Original post link

Take a look at this, it is recommended to first try deleting the /tidb-data/prometheus-9090/docdb file, and then restart.