There is an issue with BR backup version selection, causing TiDB backup failure

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: BR备份版本选择有问题,导致tidb备份失败

| username: Hacker_9LYnzJhP

【TiDB Usage Environment】Production\Test Environment\POC
Production Environment
tidb operator
tidb-operator-1.1.6

K8S Version
GitVersion:“v1.16.4-12.8d683d9”

【TiDB Version】
v5.4.0
【Encountered Problem】

$ helm install tpaas-tidb-backup tidb-full-backup/ -n tpaas-new-tidb
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/tpaasbjopdba/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/tpaasbjopdba/.kube/config
NAME: tpaas-tidb-backup
LAST DEPLOYED: Tue Jun 28 20:10:47 2022
NAMESPACE: tpaas-new-tidb
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ kubectl get pods -n tpaas-new-tidb
NAME READY STATUS RESTARTS AGE
backup-daemon-backup-s3-lktnr 0/1 Error 0 6s
tpaas-new-tidb-discovery-6557f7b5f6-5kdzg 1/1 Running 0 5h43m
tpaas-new-tidb-monitor-596ff78c4b-gnt6c 3/3 Running 0 4h7m
tpaas-new-tidb-pd-0 1/1 Running 0 3h53m
tpaas-new-tidb-pd-1 1/1 Running 0 4h2m
tpaas-new-tidb-pd-2 1/1 Running 0 4h7m
tpaas-new-tidb-ticdc-0 1/1 Running 0 3h32m
tpaas-new-tidb-tidb-0 2/2 Running 0 3h28m
tpaas-new-tidb-tidb-1 2/2 Running 0 3h30m
tpaas-new-tidb-tidb-2 2/2 Running 0 3h31m
tpaas-new-tidb-tidb-initializer-9h2rb 0/1 Completed 0 5h43m
tpaas-new-tidb-tikv-0 1/1 Running 0 3h32m
tpaas-new-tidb-tikv-1 1/1 Running 0 3h42m
tpaas-new-tidb-tikv-2 1/1 Running 0 3h47m

Using tidb-backups-s3 tidb-operator/manifests/backup/backup-s3-br.yaml at master · pingcap/tidb-operator · GitHub for backup, the backup failed because the BR version selection was incorrect, and BR selected v4.0.7
【Reproduction Path】What operations were performed to cause the problem
【Problem Phenomenon and Impact】
Create rclone.conf file.
/tidb-backup-manager backup --namespace=tpaas-new-tidb --backupName=daemon-backup-s3 --tikvVersion=v5.4.0
I0628 12:10:48.842586 1 backup.go:71] start to process backup tpaas-new-tidb/daemon-backup-s3
I0628 12:10:48.853024 1 backup_status_updater.go:64] Backup: [tpaas-new-tidb/daemon-backup-s3] updated successfully
I0628 12:10:48.875320 1 backup_status_updater.go:64] Backup: [tpaas-new-tidb/daemon-backup-s3] updated successfully
I0628 12:10:48.880814 1 manager.go:176] cluster tpaas-new-tidb/daemon-backup-s3 tikv_gc_life_time is 10m0s
I0628 12:10:48.891028 1 manager.go:240] set cluster tpaas-new-tidb/daemon-backup-s3 tikv_gc_life_time to 72h success
I0628 12:10:48.891063 1 backup.go:67] Running br command with args: [backup full --pd=tpaas-new-tidb-pd.tpaas-new-tidb:2379 --storage=s3://tpaas-tidb-backup/tpaas-tidb-new/backup/06281700/ --s3.region=cn-north-1 --s3.provider=ceph --s3.endpoint=http://s3-internal.cn-north-1.jdcloud-oss.com]
I0628 12:10:48.915795 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:35] [“Welcome to Backup & Restore (BR)”]
I0628 12:10:48.915820 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:36] [BR] [release-version=v4.0.7]
I0628 12:10:48.915830 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:37] [BR] [git-hash=4d29fcccaa12d6355a829a69b8df1594281a14e2]
I0628 12:10:48.915839 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:38] [BR] [git-branch=heads/refs/tags/v4.0.7]
I0628 12:10:48.915845 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:39] [BR] [go-version=go1.13]
I0628 12:10:48.915851 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:40] [BR] [utc-build-time=“2020-09-29 06:52:02”]
I0628 12:10:48.915857 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:41] [BR] [race-enabled=false]
I0628 12:10:48.915868 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [common.go:378] [arguments] [pd=“[tpaas-new-tidb-pd.tpaas-new-tidb:2379]”] [s3.endpoint=http://s3-internal.cn-north-1.jdcloud-oss.com] [s3.provider=ceph] [s3.region=cn-north-1] [storage=s3://tpaas-tidb-backup/tpaas-tidb-new/backup/06281700/]
I0628 12:10:48.915937 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [client.go:148] [“[pd] create pd client with endpoints”] [pd-address=“[tpaas-new-tidb-pd.tpaas-new-tidb:2379]”]
I0628 12:10:48.923721 1 backup.go:91] [2022/06/28 12:10:48.923 +00:00] [INFO] [base_client.go:237] [“[pd] update member urls”] [old-urls=“[http://tpaas-new-tidb-pd.tpaas-new-tidb:2379]”] [new-urls=“[http://tpaas-new-tidb-pd-0.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379,http://tpaas-new-tidb-pd-1.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379,http://tpaas-new-tidb-pd-2.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379]”]
I0628 12:10:48.923735 1 backup.go:91] [2022/06/28 12:10:48.923 +00:00] [INFO] [base_client.go:253] [“[pd] switch leader”] [new-leader=http://tpaas-new-tidb-pd-2.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379] [old-leader=]
I0628 12:10:48.923753 1 backup.go:91] [2022/06/28 12:10:48.923 +00:00] [INFO] [base_client.go:103] [“[pd] init cluster id”] [cluster-id=7114173756406158756]
I0628 12:10:48.941415 1 backup.go:91] [2022/06/28 12:10:48.941 +00:00] [INFO] [client.go:148] [“[pd] create pd client with endpoints”] [pd-address=“[tpaas-new-tidb-pd.tpaas-new-tidb:2379]”]
I0628 12:10:48.948296 1 backup.go:91] [2022/06/28 12:10:48.948 +00:00] [INFO] [base_client.go:237] [“[pd] update member urls”] [old-urls=“[http://tpaas-new-tidb-pd.tpaas-new-tidb:2379]”] [new-urls=“[http://tpaas-new-tidb-pd-0.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379,http://tpaas-new-tidb-pd-1.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379,http://tpaas-new-tidb-pd-2.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379]”]
I0628 12:10:48.948326 1 backup.go:91] [2022/06/28 12:10:48.948 +00:00] [INFO] [base_client.go:253] [“[pd] switch leader”] [new-leader=http://tpaas-new-tidb-pd-2.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379] [old-leader=]
I0628 12:10:48.948372 1 backup.go:91] [2022/06/28 12:10:48.948 +00:00] [INFO] [base_client.go:103] [“[pd] init cluster id”] [cluster-id=7114173756406158756]
I0628 12:10:48.954318 1 backup.go:91] [2022/06/28 12:10:48.954 +00:00] [INFO] [collector.go:187] [“Full backup Failed summary : total backup ranges: 0, total success: 0, total failed: 0”]
I0628 12:10:48.954515 1 backup.go:91] [2022/06/28 12:10:48.954 +00:00] [ERROR] [backup.go:25] [“failed to backup”] [error=“running BR in incompatible version of cluster, if you believe it’s OK, use --check-requirements=false to skip.: TiKV node tpaas-new-tidb-tikv-1.tpaas-new-tidb-tikv-peer.tpaas-new-tidb.svc:20160 version 5.4.0 and BR v4.0.7 major version mismatch, please use the same version of BR”] [errorVerbose=“TiKV node tpaas-new-tidb-tikv-1.tpaas-new-tidb-tikv-peer.tpaas-new-tidb.svc:20160 version 5.4.0 and BR v4.0.7 major version mismatch, please use the same version of BR
github.com/pingcap/br/pkg/utils.CheckClusterVersion
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/pkg/utils/version.go:123
github.com/pingcap/br/pkg/conn.NewMgr
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/pkg/conn/conn.go:203
github.com/pingcap/br/pkg/task.newMgr
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/pkg/task/common.go:312
github.com/pingcap/br/pkg/task.RunBackup
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/pkg/task/backup.go:178
github.com/pingcap/br/cmd.runBackupCommand
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/cmd/backup.go:24
github.com/pingcap/br/cmd.newFullBackupCommand.func1
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/cmd/backup.go:84
github.com/spf13/cobra.(*Command).execute
\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
github.com/spf13/cobra.(*Command).ExecuteC
\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
github.com/spf13/cobra.(*Command).Execute
\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/main.go:57
runtime.main
\t/usr/local/go/src/runtime/proc.go:203
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357
running BR in incompatible version of cluster, if you believe it’s OK, use --check-requirements=false to skip.”] [stack=“github.com/pingcap/br/cmd.runBackupCommand
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/cmd/backup.go:25
github.com/pingcap/br/cmd.newFullBackupCommand.func1
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/cmd/backup.go:84
github.com/spf13/cobra.(*Command).execute
\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842
github.com/spf13/cobra.(*Command).ExecuteC
\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950
github.com/spf13/cobra.(*Command).Execute
\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
main.main
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/main.go:57
runtime.main
\t/usr/local/go/src/runtime/proc.go:203”]
【Attachments】

  • Related logs, configuration files, Grafana monitoring (https://metricstool.pingcap.com/)
  • TiUP Cluster Display information
  • TiUP Cluster Edit config information
  • TiDB-Overview monitoring
  • Corresponding module Grafana monitoring (if any, such as BR, TiDB-binlog, TiCDC, etc.)
  • Corresponding module logs (including logs one hour before and after the issue)

If the question is about performance optimization or troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output and upload it.

| username: Cabbager | Original post link

Try using the same version for TiDB and BR. It is recommended not to use BR across major versions.

| username: gejibin | Original post link

Using TiDB 4.0.16, the operator is 1.1.6, and BR is still using 4.0.7 when creating backups.

| username: gejibin | Original post link

How does tidb-operator-1.1.6 obtain the version of br?

| username: Kongdom | Original post link

You can refer to the version compatibility in the official documentation:
https://docs.pingcap.com/zh/tidb/v5.4/backup-and-restore-tool#使用限制

| username: gejibin | Original post link

How do I specify the version of BR in tidb-operator-1.1.6?

| username: Cabbager | Original post link

You can refer to this document.

| username: Kongdom | Original post link

TiDB Operator is an automated operation and maintenance system for TiDB clusters on Kubernetes, providing full lifecycle management of TiDB, including deployment, upgrades, scaling, backup and recovery, and configuration changes.

The choice of the Br version should be based on the version of the TiDB database, not the version of the operation and maintenance tools. That’s how I understand it.

| username: gejibin | Original post link

Let’s try specifying the version.

| username: Kongdom | Original post link

Additionally, the error message indicates version 5.4, but your reply mentions version 4.0.16. It is important to confirm the exact version of the TiDB cluster.

| username: gejibin | Original post link

We tested both 5.4 and 4.0.16, but br always uses 4.0.7.

| username: Kongdom | Original post link

Try using version 5.0 of BR on the 5.4 cluster.

| username: gejibin | Original post link

The operator v1.1.6 does not support this configuration item; it is only supported starting from v1.1.9.

| username: gejibin | Original post link

The version of BR cannot be specified; it is controlled by the operator.

| username: Cabbager | Original post link

It should be supported starting from version 1.1.7, as described in the release notes:

It looks like you can only upgrade the operator.

| username: gejibin | Original post link

Theoretically, it shouldn’t be the case. The 1.1.6 operator should also support 4.0.8 or 4.0.16, not just 4.0.7.

| username: Cabbager | Original post link

Could you describe your specific steps?
Did you use tioperate to deploy a v4.0.7 TiDB and BR, and then upgrade TiDB to v5.4.0?
Or did you directly deploy v5.4.0 TiDB and then deploy BR?

| username: gejibin | Original post link

Version 1.1.6 does not require deploying BR. You only need to deploy the operator and create a TiDB instance, which can be version 4.0.8/4.0.16, but BR always uses version 4.0.7.

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.