PD node abnormally goes offline, no obvious error found in logs

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd节点异常下线,查看日志无明显error

| username: Jolyne

[TiDB Usage Environment] Production Environment / Test / Poc
Production Environment
[TiDB Version]
5.2.1
[Reproduction Path] What operations were performed when the issue occurred
Around 4 PM, it was found that the PD node was in a down state, and there were no errors in the logs. Upon checking the server, the service had stopped directly.
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]




| username: 裤衩儿飞上天 | Original post link

Is the final log only up to 13:27?
What about the OS logs?

| username: Jolyne | Original post link

It stops at around 1 o’clock. What do you mean by OS logs?

| username: 裤衩儿飞上天 | Original post link

The /var/log/message on the downed PD node

| username: Jolyne | Original post link

It’s all this error.

| username: Jolyne | Original post link

I found the error, the disk is full :sweat_smile:

| username: Jolyne | Original post link

Can this PD log be deleted? Also, should I restart PD in the form of scaling down and then scaling up?

| username: 裤衩儿飞上天 | Original post link

  1. Logs can be deleted.
  2. After cleaning up disk space, just start it directly.
  3. Set up the monitoring first.
| username: Jolyne | Original post link

Okay, thank you very much.

| username: Jolyne | Original post link

I have a question to ask. I have 3 nodes, and only the logs on this node are particularly large, while the logs on the other nodes are normal. Is there any rule for generating logs, or is it because this node is the leader? What else could affect the size of the PD logs? Just the logs for February alone are 200GB, which is more than the entire last year.

| username: 裤衩儿飞上天 | Original post link

  1. There are no special rules. If this node frequently reports errors or warnings, the logs generated will definitely be more than usual.
  2. You can set the log retention time.
| username: Jolyne | Original post link

Okay, thank you.

| username: 裤衩儿飞上天 | Original post link

For specific reference: PD Configuration File Description | PingCAP Documentation Center
image

| username: 裤衩儿飞上天 | Original post link

tiup cluster edit-config