Why does the blackbox component remain running when using tiup to shut down the cluster while other components have been successfully shut down? Please advise

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用tiup关闭集群,其它组件关闭完成,blackbox未关闭,是何原因,请大佬指教

| username: hanyj

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 6.1.4
[Reproduction Path] tiup cluster stop ${cluster-name}
[Encountered Problem: Problem Phenomenon and Impact] Using tiup to shut down the cluster, other components shut down successfully, but blackbox did not shut down; later manually executed the shutdown of blackbox separately.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]
Error: failed to stop: node exporter-9100.service, please check the instance’s log() for more detail.: timed out waiting for port 9100 to be stopped after 2m0s

2024-06-12T16:57:17.026+0800 INFO Execute command finished {“code”: 1, “error”: “failed to stop: node_exporter-9100.service, please check the instance’s log() for more detail.: timed out waiting for port 9100 to be stopped after 2m0s”, “errorVerbose”: “timed out waiting for port 9100 to be stopped after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStopped\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:130\ngithub.com/pingcap/tiup/pkg/cluster/operation.systemctlMonitor.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:338\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to stop: node_exporter-9100.service, please check the instance’s log() for more detail.”}

| username: lemonade010 | Original post link

“please check the instance’s log() for more detail.” Go check the blockbox’s log.

| username: zhaokede | Original post link

It should be an error, check the description in the logs.

| username: hanyj | Original post link

The blackbox_exporter.log only contains the logs from the time it was started.

| username: jiayou64 | Original post link

Is it deployed on a single machine? Try running tiup cluster display <cluster-name> to check.

| username: hanyj | Original post link

Not a single machine
alertmanager 9093/9094 linux/x86_64 Down
grafana 3000 linux/x86_64 Down
pd 2379/2380 linux/x86_64 Down
pd 2379/2380 linux/x86_64 Down
pd 2379/2380 linux/x86_64 Down
prometheus 9091/12020 linux/x86_64 Down
tidb 13306/10080 linux/x86_64 Down
tidb 13306/10080 linux/x86_64 Down
tidb 13306/10080 linux/x86_64 Down
tikv 20160/2010 linux/x86_64 N/A
tikv 20160/2010 linux/x86_64 N/A
tikv 20160/2010 linux/x86_64 N/A

| username: jiayou64 | Original post link

  • monitored: Monitoring service configuration, i.e., blackbox exporter and node exporter. Each machine will deploy a node exporter and a blackbox exporter.
    Log in to tidb
    ss -ntl|grep 9100
    Configure the monitoring node separately, refer to the official topology configuration:
    最小拓扑架构 | PingCAP 文档中心
| username: ziptoam | Original post link

Try manually stopping node_exporter.

| username: 郑旭东石家庄 | Original post link

You can try closing it multiple times. I encountered a similar issue during the closing process. It is generally caused by latency.

| username: hanyj | Original post link

It was manually stopped afterwards.

| username: hanyj | Original post link

Tried twice but couldn’t shut it down, so I did it manually afterward.

| username: hanyj | Original post link

Manually turned off later; the configuration is based on the official website’s configuration.

| username: ziptoam | Original post link

Sometimes when I start the cluster, I also encounter this timeout error, with some services timing out.

| username: 小于同学 | Original post link

There was an error, right?

| username: hanyj | Original post link

Ah, it also starts.

| username: hanyj | Original post link

The attachment has been posted.

| username: WalterWj | Original post link

Take a look at the corresponding startup script content:

Check if there are any records in the logs. :thinking:
Normally, shutting down and starting up are very quick.