Upgrading from 5.3.1 to 6.5.3 reports failed to stop: XXXXX node_exporter-9100.service

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 从5.3.1 升级只6.5.3 报failed to stop: XXXXX node_exporter-9100.service

| username: 饭光小团

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
Encountered an error at the last step of the upgrade:
2023-08-04T17:36:37.770+0800 DEBUG TaskFinish {“task”: “UpgradeCluster”, “error”: “failed to stop: node_exporter-9100.service, please check the instance’s log() for more detail.: timed out waiting for port 9100 to be stopped after 2m0s”, “errorVerbose”: “timed out waiting for port 9100 to be stopped after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStopped\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:130\ngithub.com/pingcap/tiup/pkg/cluster/operation.systemctlMonitor.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:338\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to stop: 10.200.45.134 node_exporter-9100.service, please check the instance’s log() for more detail.”}

| username: tidb菜鸟一只 | Original post link

Check if this issue is the same as the one mentioned here:
The problem of node_export not starting when upgrading from 5.3.4 to 6.0. - :ringer_planet: TiDB Technical Issues - TiDB Q&A Community (asktug.com)

| username: tidb狂热爱好者 | Original post link

Manually stop the node export and then upgrade
service node_exporter-9100 stop

| username: 饭光小团 | Original post link

Manually stop all node exporters?

| username: 饭光小团 | Original post link

My error cannot be stopped, the node exporter process is still running:
fdc 23911 1 2 17:34 ? 00:00:02 bin/node_exporter --web.listen-address=:9100 --collector.tcpstat --collector.systemd --collector.mountstats --collector.meminfo_numa --collector.interrupts --collector.buddyinfo --collector.vmstat.fields=^.* --log.level=info
fdc 23918 23911 0 17:34 ? 00:00:00 /bin/bash /home/fdc/tidb/deploy_default/deploy/monitor-9100/scripts/run_node_exporter.sh
fdc 23920 23918 0 17:34 ? 00:00:00 tee -i -a /home/fdc/tidb/deploy_default/deploy/monitor-9100/log/node_exporter.log

| username: 饭光小团 | Original post link

The error is this: timed out waiting for port 9100 to be stopped after 2m0s

| username: tidb菜鸟一只 | Original post link

Manually stopped it.

| username: 饭光小团 | Original post link

Manually stopped it, but when continuing to execute tiup cluster replay gbwgtVF0CFs to complete the upgrade, it still reports the error: Error: failed to stop: node_exporter-9100.service.

| username: redgame | Original post link

Are there any residual processes running? Use ps -ef | grep node_exporter to check if there are any related processes still running.

| username: 饭光小团 | Original post link

I checked and indeed there isn’t any.

| username: 饭光小团 | Original post link

From everyone’s analysis, it is confirmed that as long as the node exporter is stopped, it should be fine, right?

| username: zhanggame1 | Original post link

Node exporter can be deployed independently.

| username: ffeenn | Original post link

tiup also uses the command systemctl stop node_exporter-9100 to stop. Try stopping it manually to see what error it reports and check the logs. This warning error is detected at spec.PortStopped after executing the command, indicating that the 9100 port on this node still exists. The ss command is used for checking.