PD node abnormally goes offline, no obvious error found in logs

translator_bot · June 22, 2024, 2:15pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd节点异常下线，查看日志无明显error

| username: Jolyne

[TiDB Usage Environment] Production Environment / Test / Poc
Production Environment
[TiDB Version]
5.2.1
[Reproduction Path] What operations were performed when the issue occurred
Around 4 PM, it was found that the PD node was in a down state, and there were no errors in the logs. Upon checking the server, the service had stopped directly.
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

translator_bot · June 22, 2024, 2:15pm

| username: 裤衩儿飞上天 | Original post link

Is the final log only up to 13:27?
What about the OS logs?

translator_bot · June 22, 2024, 2:15pm

| username: Jolyne | Original post link

It stops at around 1 o’clock. What do you mean by OS logs?

translator_bot · June 22, 2024, 2:15pm

| username: 裤衩儿飞上天 | Original post link

The /var/log/message on the downed PD node

translator_bot · June 22, 2024, 2:15pm

| username: Jolyne | Original post link

It’s all this error.

translator_bot · June 22, 2024, 2:15pm

| username: Jolyne | Original post link

I found the error, the disk is full

translator_bot · June 22, 2024, 2:15pm

| username: Jolyne | Original post link

Can this PD log be deleted? Also, should I restart PD in the form of scaling down and then scaling up?

translator_bot · June 22, 2024, 2:15pm

| username: 裤衩儿飞上天 | Original post link

Logs can be deleted.
After cleaning up disk space, just start it directly.
Set up the monitoring first.

translator_bot · June 22, 2024, 2:15pm

| username: Jolyne | Original post link

Okay, thank you very much.

translator_bot · June 22, 2024, 2:15pm

| username: Jolyne | Original post link

I have a question to ask. I have 3 nodes, and only the logs on this node are particularly large, while the logs on the other nodes are normal. Is there any rule for generating logs, or is it because this node is the leader? What else could affect the size of the PD logs? Just the logs for February alone are 200GB, which is more than the entire last year.

translator_bot · June 22, 2024, 2:15pm

| username: 裤衩儿飞上天 | Original post link

There are no special rules. If this node frequently reports errors or warnings, the logs generated will definitely be more than usual.
You can set the log retention time.

translator_bot · June 22, 2024, 2:15pm

| username: Jolyne | Original post link

Okay, thank you.

translator_bot · June 22, 2024, 2:15pm

| username: 裤衩儿飞上天 | Original post link

For specific reference: PD Configuration File Description | PingCAP Documentation Center

translator_bot · June 22, 2024, 2:15pm

| username: 裤衩儿飞上天 | Original post link

tiup cluster edit-config