The cluster deployed using TiUP shows that the necessary component NgMonitoring is not started

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用 TiUP 部署的集群显示集群中未启动必要组件 NgMonitoring

| username: myzz

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.0
[Encountered Issue: Phenomenon and Impact]


After completing the deployment according to the official TiUP installation process, it shows that the necessary component NgMonitoring is not started. Then I went to the official website

After troubleshooting according to the corresponding steps, I found that the configuration does have ng_port configured.
To be safe, I reloaded once, but the same error still occurs after reloading. I don’t know how to solve it.

Thank you, experts, for your help in answering this.

| username: caiyfc | Original post link

Take a look at this: NgMonitoring cannot start issue - :ringer_planet: TiDB Technical Issues / Deployment & Operations Management - TiDB Q&A Community (asktug.com)

| username: myzz | Original post link

I just checked, and there is no /data/tidb-data/prometheus-9090/docdb file on my ng_monitoring machine.

| username: caiyfc | Original post link

Check the logs to see if there are any errors.

| username: myzz | Original post link

Where can I find all the startup logs related to TiDB? I couldn’t find it in the documentation. :expressionless:

| username: caiyfc | Original post link

Check if there are any error messages in this path.

| username: myzz | Original post link

I went to /data/tidb-deploy/prometheus-9090/log and didn’t see any logs. I went to /data/tidb-deploy/prometheus-9090/bin and executed ./ng-monitoring-service, but got this error:

However, I saw in the ngmonitoring configuration that the pd_servers is set to the address of another node. Why is it 127.0.0.1 here?

| username: caiyfc | Original post link

Let’s see if anyone else has any ideas?
Or you can scale down Prometheus first, then scale it up again, which is equivalent to redeploying Prometheus, and then see?

| username: myzz | Original post link

I placed Prometheus on the PD node and then manually started ngmonitoring, and it worked. :expressionless: Is it possible that ngmonitoring didn’t read the PD address from the configuration file?

| username: caiyfc | Original post link

This phenomenon seems to be the case. You can try changing the IP in the configuration file read by ngmonitoring, which should be located at /data/tidb-deploy/prometheus-9090/conf. After making the changes, restart Prometheus, but make sure not to reload.

| username: myzz | Original post link

Solved it. After taking a closer look, I found that the endpoints node in the ngmonitoring.toml configuration is an array, but there were no commas separating the elements. After adding the commas and manually starting ngmonitoring, it worked fine. :expressionless:

| username: caiyfc | Original post link

:rofl: But you need to be careful when using it. Reloading the cluster will update the configuration files of each component, and this issue might occur again.

| username: myzz | Original post link

First, I raised an issue. If there are any problems, I’ll go back and make changes. Anyway, ngmonitoring doesn’t affect the normal use of the cluster, right? :confounded:

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.