Frequently Encountering PD Status Changing to Down and TiKV Status Changing to N/A When Checking Cluster Information

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 查看集群信息经常出现pd的status变为Down,tikv的Status变为N/A

| username: TiDBer_uI8QIp1t

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]
When using tiup cluster display tidb-cluster, the status of PD often changes to Down, and the status of TiKV changes to N/A. Does this happen to you as well?

| username: tidb菜鸟一只 | Original post link

This situation has never occurred.

| username: lemonade010 | Original post link

This issue did not occur in V6.5.0. When an exception occurs, take a look at the logs.

| username: 小龙虾爱大龙虾 | Original post link

Normally, this shouldn’t happen. Check if it’s really down by running tiup cluster display tidb-cluster --uptime.

| username: forever | Original post link

The test cluster has never appeared.

| username: TiDBer_CQ | Original post link

It has never happened before, but TiDB has experienced issues where it couldn’t start due to OOM (Out of Memory).

| username: 路在何chu | Original post link

I occasionally encounter this issue, but it resolves itself after a while. In fact, the cluster is fine, it might be a network problem, so I haven’t paid much attention to it.

| username: 有猫万事足 | Original post link

This issue shouldn’t occur frequently. Is it a mixed deployment? Could it be due to insufficient memory, causing frequent OOM (Out of Memory) errors?

| username: 像风一样的男子 | Original post link

Are you experiencing network fluctuations?

| username: Kongdom | Original post link

It happened a few years ago when I first started using it, but it hasn’t happened since.

| username: DBRE | Original post link

Is it normal to ping each node with tiup?

| username: redgame | Original post link

Stable, check the network.

| username: changpeng75 | Original post link

Which version of TiDB is it? Try to choose a stable LTS version. The network stability also needs to be checked.

| username: DBAER | Original post link

ps -ef | grep tidb to check the start time
Which specific version?

| username: zhanggame1 | Original post link

Several test clusters, never seen anything like this before. Do you think there might be a network issue?

| username: Kongdom | Original post link

Just to clarify, is it only the display showing an abnormal status, or is the actual cluster unavailable?

| username: TIDB-Learner | Original post link

I haven’t encountered this before. Previously, I experienced that the TiDB instance would restart periodically. Since there were multiple instances, it didn’t affect usage. After upgrading to 6.5, it hasn’t happened again.

| username: zhanggame1 | Original post link

Is the scheduled restart punctual?

| username: dba远航 | Original post link

It has never happened before. Is your issue consistent, or has it always been like this?

| username: 霸王龙的日常 | Original post link

We are using version 7.1 here and have not encountered this issue.