Why does the blackbox component remain running when using tiup to shut down the cluster while other components have been successfully shut down? Please advise

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用tiup关闭集群,其它组件关闭完成,blackbox未关闭,是何原因,请大佬指教

| username: hanyj

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 6.1.4
[Reproduction Path] tiup cluster stop ${cluster-name}
[Encountered Problem: Problem Phenomenon and Impact] Using tiup to shut down the cluster, other components shut down successfully, but blackbox did not shut down; later manually executed the shutdown of blackbox separately.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]
Error: failed to stop: 10.10.0.1 node exporter-9100.service, please check the instance’s log() for more detail.: timed out waiting for port 9100 to be stopped after 2m0s

2024-06-12T16:57:17.026+0800 INFO Execute command finished {“code”: 1, “error”: “failed to stop: 10.10.0.1 node_exporter-9100.service, please check the instance’s log() for more detail.: timed out waiting for port 9100 to be stopped after 2m0s”, “errorVerbose”: “timed out waiting for port 9100 to be stopped after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStopped\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:130\ngithub.com/pingcap/tiup/pkg/cluster/operation.systemctlMonitor.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:338\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to stop: 10.10.0.1 node_exporter-9100.service, please check the instance’s log() for more detail.”}

| username: lemonade010 | Original post link

“please check the instance’s log() for more detail.” Go check the blockbox’s log.

| username: zhaokede | Original post link

It should be an error, check the description in the logs.

| username: hanyj | Original post link

The blackbox_exporter.log only contains the logs from the time it was started.

| username: jiayou64 | Original post link

Is it deployed on a single machine? Try running tiup cluster display <cluster-name> to check.

| username: hanyj | Original post link

Not a single machine
alertmanager 10.10.0.1 9093/9094 linux/x86_64 Down
grafana 10.10.0.1 3000 linux/x86_64 Down
pd 10.10.0.2 2379/2380 linux/x86_64 Down
pd 10.10.0.1 2379/2380 linux/x86_64 Down
pd 10.10.0.3 2379/2380 linux/x86_64 Down
prometheus 10.10.0.1 9091/12020 linux/x86_64 Down
tidb 10.10.0.2 13306/10080 linux/x86_64 Down
tidb 10.10.0.1 13306/10080 linux/x86_64 Down
tidb 10.10.0.3 13306/10080 linux/x86_64 Down
tikv 10.10.0.2 20160/2010 linux/x86_64 N/A
tikv 10.10.0.1 20160/2010 linux/x86_64 N/A
tikv 10.10.0.3 20160/2010 linux/x86_64 N/A

| username: jiayou64 | Original post link

  • monitored: Monitoring service configuration, i.e., blackbox exporter and node exporter. Each machine will deploy a node exporter and a blackbox exporter.
    Log in to tidb 10.10.0.1
    ss -ntl|grep 9100
    Configure the monitoring node separately, refer to the official topology configuration:
    最小拓扑架构 | PingCAP 文档中心
| username: ziptoam | Original post link

Try manually stopping node_exporter.

| username: 郑旭东石家庄 | Original post link

You can try closing it multiple times. I encountered a similar issue during the closing process. It is generally caused by latency.

| username: hanyj | Original post link

It was manually stopped afterwards.

| username: hanyj | Original post link

Tried twice but couldn’t shut it down, so I did it manually afterward.

| username: hanyj | Original post link

Manually turned off later; the configuration is based on the official website’s configuration.

| username: ziptoam | Original post link

Sometimes when I start the cluster, I also encounter this timeout error, with some services timing out.

| username: 小于同学 | Original post link

There was an error, right?

| username: hanyj | Original post link

Ah, it also starts.

| username: hanyj | Original post link

The attachment has been posted.

| username: WalterWj | Original post link

Take a look at the corresponding startup script content:

Check if there are any records in the logs. :thinking:
Normally, shutting down and starting up are very quick.