[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
Encountered an error at the last step of the upgrade:
2023-08-04T17:36:37.770+0800 DEBUG TaskFinish {“task”: “UpgradeCluster”, “error”: “failed to stop: node_exporter-9100.service, please check the instance’s log() for more detail.: timed out waiting for port 9100 to be stopped after 2m0s”, “errorVerbose”: “timed out waiting for port 9100 to be stopped after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStopped\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:130\ngithub.com/pingcap/tiup/pkg/cluster/operation.systemctlMonitor.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:338\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to stop: 10.200.45.134 node_exporter-9100.service, please check the instance’s log() for more detail.”}
Manually stopped it, but when continuing to execute tiup cluster replay gbwgtVF0CFs to complete the upgrade, it still reports the error: Error: failed to stop: node_exporter-9100.service.
tiup also uses the command systemctl stop node_exporter-9100 to stop. Try stopping it manually to see what error it reports and check the logs. This warning error is detected at spec.PortStopped after executing the command, indicating that the 9100 port on this node still exists. The ss command is used for checking.