TiUP installation of DM fails to start alertmanager_servers, keeps reporting 9093 timeout

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiup 安装dm无法启动 alertmanager_servers ,吐了我吐了,一直报9093 timeout

| username: TiDBer_Q8sRmU9E

[TiDB Usage Environment] /Test/
[TiDB Version] v8.1.0
[Reproduction Path] Installed according to the gateway
[Encountered Problem: Problem Phenomenon and Impact] Unable to start component alertmanager
Starting instance my_ip:9093

Error: failed to start alertmanager: failed to start: my_ip alertmanager-9093.service, please check the instance’s log (/home/tidb/dm/deploy/alertmanager-9093/log) for more detail.: timed out waiting for port 9093 to be started after 2m0s
Starting dm

[Attachment: Screenshot/Log/Monitoring]
level=info msg=“Starting Alertmanager” version=“(version=0.26.0, branch=HEAD, revision=d7b4f0c7322e7151d6e3b1e31cbc15361e295d8d)”
ts=2024-06-19T14:20:20.092Z caller=main.go:246 level=info build_context=“(go=go1.20.7, platform=linux/amd64, user=root@df8d7debeef4, date=20230824-11:11:58, tags=netgo)”
ts=2024-06-19T14:20:20.092Z caller=main.go:278 level=error msg=“unable to initialize gossip mesh” err=“create memberlist: Could not set up network transport: failed to obtain an address: Failed to start TCP listener on "my_ip" port 9094: listen tcp my_ip:9094: bind: cannot assign requested address”

| username: YuchongXU | Original post link

Is the local firewall turned off?

| username: onlyacat | Original post link

SSH into your Alertmanager’s IP and check if the port is occupied.

| username: 有猫万事足 | Original post link

To supplement the deployment topology, if it’s not a network/firewall issue, could it be that you have deployed everything on a single machine, leading to CPU insufficiency and causing the timeout?

Also, your IP was exposed in the logs, so I edited it for you.

| username: zhaokede | Original post link

Unable to establish network transmission: failed to obtain address; is the port occupied?

| username: TiDBer_Q8sRmU9E | Original post link

Thank you. The testing environment only has this service. 8+16 cores should be fine. I’ll go check the topology.

| username: TiDBer_Q8sRmU9E | Original post link

It shouldn’t be. I’ve already checked, and the firewall is enabled.

| username: TiDBer_Q8sRmU9E | Original post link

It shouldn’t be. I’ve already checked, and the firewall is enabled.

| username: zhaokede | Original post link

Checked if the port is occupied.

| username: 有猫万事足 | Original post link

This is the problem. There are many components in a cluster, and it is normal for some components to not get CPU execution time for 2 minutes during centralized startup.

Moreover, it seems that you have deployed both TiDB and DM on this machine?

This makes resources even more strained.

If you cannot add more machines, the current solution is to adjust this parameter:

Both tiup cluster and tiup dm have this parameter, with a default timeout of 120—2 minutes. Adjust it to 600, which is 10 minutes of startup time. See if it works, and if not, increase it further.

| username: TiDBer_Q8sRmU9E | Original post link

None

| username: TiDBer_Q8sRmU9E | Original post link

I’ll give it a try, thanks bro. I see the CPU usage is not high either.

| username: TiDBer_Q8sRmU9E | Original post link

Even starting a new machine cannot start this 9093/9094. I have tried.

| username: TiDBer_Q8sRmU9E | Original post link

Even 600 can’t start. Some posts say it’s an IP issue, but I checked and the IPs are correct.

| username: 有猫万事足 | Original post link

Can you see your external IP with ifconfig?
If you can’t see the external IP with ifconfig, you can only bind it to 0.0.0.0, since you keep saying there’s no issue with the firewall. If you can see the external IP, I don’t think there should be a problem with binding it.

| username: TiDBer_Q8sRmU9E | Original post link

The IP is not a problem.

| username: Kongdom | Original post link

Try turning off the firewall.