Is it normal for node anomalies to occur during the BR recovery process?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: br恢复过程中出现节点异常,这是正常现象吗?

| username: 滴滴嗒嘀嗒

As shown in the picture:

| username: 这里介绍不了我 | Original post link

BR probably crashed the node.

| username: zhaokede | Original post link

All PD nodes are down, which is definitely abnormal;
Check the logs to see what is reported.

| username: 连连看db | Original post link

There seems to be an issue with the restore operation. When you executed the PITR restore, you didn’t specify the full backup restore path, right? Or you can choose not to use PITR and use full instead; this should allow you to restore.

| username: jiayou64 | Original post link

First, restore to normal since PD is not working. Start with a full restore using the restore point.

| username: 滴滴嗒嘀嗒 | Original post link

I had already performed a full restore before the restore point, and they were executed separately.

| username: Kamner | Original post link

Your cluster only has two KV nodes? You need at least 3 nodes.

| username: 呢莫不爱吃鱼 | Original post link

The node distribution is unreasonable, causing crashes.

| username: lemonade010 | Original post link

First, check the PD logs to see why it crashed. Once it has returned to normal, then check the BR logs.

| username: 滴滴嗒嘀嗒 | Original post link

It’s very strange. I ran the recovery several times, and during the entire recovery process, the status of these nodes alternated between normal and abnormal. There were two final outcomes:

  1. After the recovery was completed, everything returned to normal, and the data was successfully restored.
  2. The nodes remained abnormal, and the recovery task ultimately failed.
| username: 滴滴嗒嘀嗒 | Original post link

There have been reports of “not leader”.

| username: 滴滴嗒嘀嗒 | Original post link

There are some ERRORs in the PD logs, but I’m not quite sure what the root cause of the node crash is.

| username: TiDBer_QYr0vohO | Original post link

It’s not normal, check the logs.

| username: 滴滴嗒嘀嗒 | Original post link

Why is it unreasonable? Are you referring to deploying multiple roles on a single machine?

| username: 滴滴嗒嘀嗒 | Original post link

This shouldn’t have much of an impact, right?

| username: 滴滴嗒嘀嗒 | Original post link

Executed separately, first the full backup was restored, then the PITR (Point-In-Time Recovery) was restored.

| username: tidb菜鸟一只 | Original post link

What kind of topology is this… 2 PD, 2 TiDB, 2 TiKV…

| username: 滴滴嗒嘀嗒 | Original post link

The resources are not sufficient, it’s for local testing :grinning:. Is it necessary to strictly follow the recommended topology?

| username: TiDBer_q2eTrp5h | Original post link

Check the logs.

| username: tidb菜鸟一只 | Original post link

For testing, it’s better to directly use tiup playground, or set up 1 PD, 1 TiDB, 1 TiKV, and configure TiKV with 1 replica…