Failed to Perform Full Data Backup Using BR in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tidb使用br全量备份数据失败

| username: ks_ops_ms

[TiDB Usage Environment] Testing
[TiDB Version]
[Reproduction Path] What operations were performed to encounter the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]
Command used: ./br backup full --pd “10.33.65.73:2379” --storage “local:///tidb-backup/test-tidb/” --log-file test-backupfull.log
Detail BR log in test-backupfull.log
[2023/11/02 16:59:54.194 +08:00] [INFO] [collector.go:77] [“Full Backup failed summary”] [total-ranges=0] [ranges-succeed=0] [ranges-failed=0]
Error: running BR in incompatible version of cluster, if you believe it’s OK, use --check-requirements=false to skip.: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp: lookup test-tidb-pd-1.test-tidb-pd-peer.base-server.svc on 127.0.0.53:53: no such host”
The local path is mounted on an NFS system on the local machine.

| username: ks_ops_ms | Original post link

TiDB version 7.1.0, BR version 7.1.0

| username: ks_ops_ms | Original post link

Log information

| username: zhanggame1 | Original post link

Can the machine executing the backup command access port 2379 of 10.33.65.73?

| username: ks_ops_ms | Original post link

This IP is for the backend listener of a load balancer I set up for PD on port 2379. Running nc -z 10.33.65.73 2379 shows no issues.

| username: pepezzzz | Original post link

It is recommended to use the BACKUP CRD method for backups in a K8S environment.

| username: 大飞哥online | Original post link

Just add the parameter --check-requirements=false to ignore it.

| username: zhanggame1 | Original post link

curl http://10.33.65.73:2379

| username: Fly-bird | Original post link

lookup test-tidb-pd-1.test-tidb-pd-peer.base-server.svc on 127.0.0.53:53: no such host This is not right

| username: tidb菜鸟一只 | Original post link

Refer to this, k8s backup is different from physical machines, 备份 TiDB 集群到持久卷 | PingCAP 文档中心

| username: ks_ops_ms | Original post link

Normal

| username: ks_ops_ms | Original post link

I’ll try using the backup CRD.

| username: 像风一样的男子 | Original post link

PD does not require load balancing.

| username: ks_ops_ms | Original post link

At first, I wanted to use SQL to back up through PD, but since PD is within the cluster and I am connecting remotely, I added a load balancer for external access. However, it didn’t seem to work well, so I ended up using the backup CRD.

| username: ks_ops_ms | Original post link

After starting the job through CRD, this issue appears in the job logs. It seems to be due to the role not having the necessary permissions. I tried re-binding the role, but it still doesn’t work.

| username: ks_ops_ms | Original post link

Another issue occurred during the backup process using CRD. The log error information is as follows:
[2023/11/03 11:55:12.102 +08:00] [ERROR] [backup.go:54] [“failed to backup”] [error=“failed to backup to file:////test-tidb/test-tidb-pd.base-server-2379-2023-11-03t10-55-00, because the checkpoint mode is used, but the hashes of the configs are not the same. Please check the config: [BR:Common:ErrInvalidArgument]invalid argument”] [errorVerbose=“[BR:Common:ErrInvalidArgument]invalid argument\nfailed to backup to file:////test-tidb/test-tidb-pd.base-server-2379-2023-11-03t10-55-00, because the checkpoint mode is used, but the hashes of the configs are not the same. Please check the config\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).CheckCheckpoint\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:267\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/backup.go:447\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:53\nmain.newFullBackupCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:143\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:58\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598”] [stack=“main.runBackupCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:54\nmain.newFullBackupCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:143\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:58\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

| username: ks_ops_ms | Original post link

Reassigning the role and rolebinding resolved this issue.

| username: ks_ops_ms | Original post link

Teacher, there is currently an error in the job that is causing the backup task to fail.


I can’t figure it out.

| username: tidb菜鸟一只 | Original post link

There seems to be an issue with the display parameters. Can you show the YAML configuration file you are currently using?

| username: ks_ops_ms | Original post link

After I changed the configuration, another issue started to occur.