TiUP installation of DM fails to start alertmanager_servers, keeps reporting 9093 timeout

Original topic: tiup 安装dm无法启动 alertmanager_servers ,吐了我吐了,一直报9093 timeout

username: TiDBer_Q8sRmU9E

[TiDB Usage Environment] /Test/
[TiDB Version] v8.1.0
[Reproduction Path] Installed according to the gateway
[Encountered Problem: Problem Phenomenon and Impact] Unable to start component alertmanager
Starting instance my_ip:9093

Error: failed to start alertmanager: failed to start: my_ip alertmanager-9093.service, please check the instance’s log (/home/tidb/dm/deploy/alertmanager-9093/log) for more detail.: timed out waiting for port 9093 to be started after 2m0s
Starting dm

[Attachment: Screenshot/Log/Monitoring]
level=info msg=“Starting Alertmanager” version=“(version=0.26.0, branch=HEAD, revision=d7b4f0c7322e7151d6e3b1e31cbc15361e295d8d)”
ts=2024-06-19T14:20:20.092Z caller=main.go:246 level=info build_context=“(go=go1.20.7, platform=linux/amd64, user=root@df8d7debeef4, date=20230824-11:11:58, tags=netgo)”
ts=2024-06-19T14:20:20.092Z caller=main.go:278 level=error msg=“unable to initialize gossip mesh” err=“create memberlist: Could not set up network transport: failed to obtain an address: Failed to start TCP listener on "my_ip" port 9094: listen tcp my_ip:9094: bind: cannot assign requested address”

Is the local firewall turned off?

SSH into your Alertmanager’s IP and check if the port is occupied.

To supplement the deployment topology, if it’s not a network/firewall issue, could it be that you have deployed everything on a single machine, leading to CPU insufficiency and causing the timeout?

Also, your IP was exposed in the logs, so I edited it for you.

Unable to establish network transmission: failed to obtain address; is the port occupied?

Thank you. The testing environment only has this service. 8+16 cores should be fine. I’ll go check the topology.

It shouldn’t be. I’ve already checked, and the firewall is enabled.

Checked if the port is occupied.

This is the problem. There are many components in a cluster, and it is normal for some components to not get CPU execution time for 2 minutes during centralized startup.

Moreover, it seems that you have deployed both TiDB and DM on this machine?

This makes resources even more strained.

If you cannot add more machines, the current solution is to adjust this parameter:

Both tiup cluster and tiup dm have this parameter, with a default timeout of 120—2 minutes. Adjust it to 600, which is 10 minutes of startup time. See if it works, and if not, increase it further.

I’ll give it a try, thanks bro. I see the CPU usage is not high either.

Even starting a new machine cannot start this 9093/9094. I have tried.

Even 600 can’t start. Some posts say it’s an IP issue, but I checked and the IPs are correct.

Can you see your external IP with ifconfig?
If you can’t see the external IP with ifconfig, you can only bind it to, since you keep saying there’s no issue with the firewall. If you can see the external IP, I don’t think there should be a problem with binding it.

The IP is not a problem.

Try turning off the firewall.