TiDB PD service is always in a down state

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb pd服务一直是down的状态

| username: hacker_77powerful

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]
[FATAL] [main.go:232] [“run server failed”] [error=“[PD:server:ErrCancelStartEtcd]etcd start canceled”] [stack=“main.start

[ERROR] [etcdutil.go:83] [“failed to get cluster from remote”] [error=“[PD:etcd:ErrEtcdGetCluster]failed to get raft cluster member(s) from the given URLs: failed to get raft cluster member(s) from the given URLs”]
[2024/04/11 17:39:13.633 +08:00] [WARN] [server.go:2098] [“failed to publish local member to cluster through raft”] [local-member-id=b43ecfd4b44129fc] [local-member-attributes=“{Name:pd-1 ClientURLs:[]}”] [request-path=/0/members/b43ecfd4b44129fc/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]

| username: TiDBer_jYQINSnf | Original post link

Is it a new cluster? It looks like the startup parameters for PD are configured incorrectly, particularly the URL part.

| username: TiDBer_jYQINSnf | Original post link

Isn’t it usually 2379? Did you make a mistake?

| username: hacker_77powerful | Original post link

There is no mistake, we used a custom port.

| username: hacker_77powerful | Original post link

It’s not a new cluster; it has been running for a while. One of the PD nodes has been in a down state because the file system is full.

| username: TiDBer_jYQINSnf | Original post link

Can’t connect to PD, check your network. If the network is fine, execute the following in a normal PD:


See if it exists. If it does, use:

member delete

to delete it, then clear the data directory of this PD and rebuild it. The data volume of PD is very small, rebuilding won’t take much time.

| username: hacker_77powerful | Original post link

It might be that the PD data directory was not cleared. I’ll try again tomorrow. Thank you.

| username: dba远航 | Original post link

If there is insufficient space, it will definitely cause server anomalies. Try clearing unused files.