Using BR for full backup in TiDB v7.5.1 sometimes succeeds and sometimes fails, error: cannot find a valid leader for key

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidbv7.5.1版本使用br做全备有时成功有时失败,报错:can not find a valid leader for key

| username: Dean-tidb

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.5.1
[Issue Encountered: br full backup sometimes succeeds and sometimes fails]
[Cluster Configuration]

br tool error log is as follows
Complete Error Log:
err_log_file (3.0 MB)

Partial error messages:
[2024/05/10 03:00:20.789 +08:00] [WARN] [push.go:81] [“skip store”] [range-sn=1] [store-id=5] [error=“the store last heartbeat is too far, at 31m13.240519664s: [BR:KV:ErrKVStorage]tikv storage occur I/O error”]

[2024/05/10 03:04:31.571 +08:00] [ERROR] [client.go:1022] [“find region failed”] [range-sn=568] [error=“rpc error: code = Canceled desc = context canceled”] [errorVerbose=“rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240210135946-3488a653ddd9/client.go:1602\ngithub.com/tikv/pd/client.(*client).GetRegion\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240210135946-3488a653ddd9/client.go:947\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).findTargetPeer\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1020\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1229\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1108\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650”] [region=null] [stack=“github.com/pingcap/tidb/br/pkg/backup.(*Client).findTargetPeer\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1022\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1229\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1108”]

[2024/05/10 03:04:32.575 +08:00] [ERROR] [client.go:1051] [“can not find a valid target peer”] [range-sn=567] [key=7480000000000002FF1F5F698000000000FF0000030135313437FF30393932FF333331FF3834353633FF3200FF000000000000F803FF8000000000000000FF0380000000018C26FFE400000000000000F8] [stack=“github.com/pingcap/tidb/br/pkg/backup.(*Client).findTargetPeer\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1051\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1229\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1108”]

[2024/05/10 03:04:32.608 +08:00] [ERROR] [main.go:60] [“br failed”] [error=“can not find a valid leader for key t\ufffd\u0000\u0000\u0000\u0000\u0000\u0002\ufffd\u001f_r\ufffd\u0000\u0000\u0000\u0000\ufffd\ufffd\u001et\u0000\u0000\u0000\u0000\u0000\ufffd: [BR:Backup:ErrBackupNoLeader]backup no leader”] [errorVerbose=“[BR:Backup:ErrBackupNoLeader]backup no leader\ncan not find a valid leader for key t\ufffd\u0000\u0000\u0000\u0000\u0000\u0002\ufffd\u001f_r\ufffd\u0000\u0000\u0000\u0000\ufffd\ufffd\u001et\u0000\u0000\u0000\u0000\u0000\ufffd\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).findTargetPeer\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1053\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1229\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1108\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650”] [stack=“main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:60\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267”]

| username: 小龙虾爱大龙虾 | Original post link

Your cluster topology is quite strange :thinking:. Do you only have two machines? How many replicas do you have? With so many components deployed, are the component statuses normal? I suspect there might be issues with your cluster status.

| username: TiDBer_QYr0vohO | Original post link

Are there only two PDs?

| username: Dean-tidb | Original post link

Yes, currently there are only two PDs.

| username: Dean-tidb | Original post link

Each TiKV node should be a replica. If we want to reduce the number of instances, what would be a reasonable amount?

| username: Dean-tidb | Original post link

Because only two physical hosts were deployed in the cluster production environment.

| username: TiDBer_QYr0vohO | Original post link

With this topology, you have TiDB, PD, and TiKV all on one machine. Don’t do this in a production environment.

| username: Dean-tidb | Original post link

Hello, the ideal situation is to deploy only one TiDB instance or one PD or TiKV instance per machine, right?

| username: Dean-tidb | Original post link

I would like to ask, is the reason for the backup error because there are too many instances deployed on the machine?

| username: TiDBer_QYr0vohO | Original post link

The error seems to be that a certain key’s Region/Leader cannot be found from PD.

| username: 小龙虾爱大龙虾 | Original post link

You can refer to the official documentation on deployment-related content:

You need to review these articles.


Incorrect deployment methods in a production environment may lead to unexpected results, such as loss of high availability.

| username: Dean-tidb | Original post link

Thank you very much.

| username: Dean-tidb | Original post link

Okay, thank you.

| username: 天下无贼 | Original post link

Are you there? How did you solve it later? [2024/09/05 17:43:32.124 +08:00] [ERROR] [main.go:60] [“br failed”] [error=“can not find a valid leader for key t\ufffd\u0000\u0000\u0000\u0000\u0000\u000b\ufffd5_r\u0003\ufffd\u0000\u0000\u0000\ufffd\u0000\ufffd\ufffd\ufffd\u0003\ufffd\u0000\u0000\ufffd\u0000\u0000\u0000\u0000\ufffd\u0003\ufffd\u0000\ufffd\u0000\u0000\u0000\u0000\u0001\ufffd\u0000\u0000\ufffd: [BR:Backup:ErrBackupNoLeader]backup no leader”] [errorVerbose=“[BR:Backup:ErrBackupNoLeader]backup no leader\ncan not find a valid leader for key t\ufffd\u0000\u0000\u0000\u0000\u0000\u000b\ufffd5_r\u0003\ufffd\u0000\u0000\u0000\ufffd\u0000\ufffd\ufffd\ufffd\u0003\ufffd\u0000\u0000\ufffd\u0000\u0000\u0000\u0000\ufffd\u0003\ufffd\u0000\ufffd\u0000\u0000\u0000\u0000\u0001\ufffd\u0000\u0000\ufffd\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).findTargetPeer\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1025\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1228\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1080\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598”] [stack=“main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:60\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”] I encountered the same problem here.

| username: cchouqiang | Original post link

This is most likely caused by poor disk performance.