During the TPC-C process, the PD (UI) node in the cluster goes down

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tpcc过程中,集群中pd(UI)节点DOWN掉

| username: 滴滴嗒嘀嗒

TPCC log:

Cluster status:

Then, except for one TiDB online, all TiKV node statuses are N/A, and all other nodes are DOWN:

The first PD node to go DOWN log:

| username: 啦啦啦啦啦 | Original post link

apply request took too long, is the hard drive an SSD? Check the resource usage in the monitoring.

| username: 滴滴嗒嘀嗒 | Original post link

After a node goes down, using tiup cluster start a_cluster -N to start the specified node fails to bring it up. The corresponding node logs are not updated, and monitoring is also unavailable.
Here are a few questions:

  1. What exactly is the apply request in the logs doing?
  2. If the downtime is too long, will it cause the PD to go offline? Is there a critical point for this duration?
  3. Will one PD going down cause other nodes to malfunction?
  4. Why can’t the node be brought up again after it goes down?

Here are the relevant screenshots—
Starting the specified node on the 18th

Node logs are still from the 17th
The hard drive is HDD

| username: Min_Chen | Original post link


The TiDB cluster requires SSDs as data read/write disks. High-pressure read/write operations on HDDs can cause the cluster to crash.