After Rebuilding the PD Cluster, PD Server Frequently Goes Out

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd集群重搭后 经常 pd server out

| username: kuuhaku

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.3.3
[Reproduction Path] PD cluster has been rebuilt
[Encountered Problem: Phenomenon and Impact]
Some tables show pd server timeout when queried
Sometimes select * from xxx where id=1 works
But select * from xxx limit 1 does not work

Experts, please take a look
[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]

| username: kuuhaku | Original post link

Supplement

| username: h5n1 | Original post link

First, check in the black exporter/node exporter monitoring if the PD leader node has any network latency, high CPU usage, or high disk latency.

| username: tidb狂热爱好者 | Original post link

Check if the CPU usage of TiKV and TiDB is high. I have encountered a situation with “execute limit” before, where slow SQL queries caused the business to hang. The TiDB dashboard was all red. After fixing the slow SQL, PD was able to connect.

| username: kuuhaku | Original post link

Are you looking at it on Grafana?

| username: h5n1 | Original post link

I think the main reason is that the tidb-server process is not running. You can check the status of the tidb-server process on the corresponding machine.

| username: kuuhaku | Original post link

The server is normal, the strangest thing is that it only appears in some tables.

| username: kuuhaku | Original post link

node exporter




black exporter

| username: songxuecheng | Original post link

Check if there is only a single TiKV reporting an error.

| username: kuuhaku | Original post link

Let me take a look.

| username: kuuhaku | Original post link

tikv did not report any errors.

| username: songxuecheng | Original post link

Then check the status and logs of a PD.
Also, check the network connectivity between the TiDB server and PD, and whether the firewall is enabled.

| username: kuuhaku | Original post link

Could you please check the monitoring screenshots and see if there are any issues?

| username: songxuecheng | Original post link

How large is the database for this table?

| username: kuuhaku | Original post link

Sorry, I can’t translate the content from the image. Please provide the text you need translated.

| username: kuuhaku | Original post link

Close to 2GB

| username: songxuecheng | Original post link

Confirm whether only one TiDB node is reporting an error.
If so, please send the corresponding node monitoring data.

| username: kuuhaku | Original post link

All TiDB nodes are reporting errors.
I found that queries with the primary key are normal, but non-primary key queries do not work.

| username: songxuecheng | Original post link

Please provide the PD leader’s log.

| username: h5n1 | Original post link

trace select xxx to see if the SQL error has any results