Error Occurred When Using BR to Backup Data to S3 for TiDB Cluster Deployed on K8s

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: k8s上部署的TiDB集群使用br备份数据到s3时报错

| username: panqiao

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.0
[Reproduction Path] Following the official documentation to use br to back up the database to s3 storage results in an error
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration]

backup-rbac.yaml

kubectl create secret generic s3-secret --from-literal=access_key=xxxxx --from-literal=secret_key=xxxxxx --namespace=tidb-cluster

backup-tidb-s3.yaml

[Attachment: Screenshot/Log/Monitoring]
I1208 10:56:16.200308 9 backup.go:93] [2022/12/08 10:56:16.200 +00:00] [INFO] [client.go:610] [“finish backup push down”] [range-sn=56] [small-range-count=0]

I1208 10:56:16.200314 9 backup.go:93] [2022/12/08 10:56:16.200 +00:00] [INFO] [client.go:718] [“start fine grained backup”] [range-sn=56] [incomplete=1]
I1208 10:56:16.200446 9 backup.go:93] [2022/12/08 10:56:16.200 +00:00] [INFO] [client.go:1041] [“try backup”] [range-sn=55] [store-id=5] [“retry time”=0]
I1208 10:56:16.201802 9 backup.go:93] [2022/12/08 10:56:16.201 +00:00] [INFO] [client.go:672] [“find leader”] [Leader=“{"id":26000,"store_id":4}”] [key=7480000000000027FF185F720000000000FF0000000000000000FA]
I1208 10:56:46.202196 9 backup.go:93] [2022/12/08 10:56:46.202 +00:00] [INFO] [client.go:1041] [“try backup”] [range-sn=56] [“retry time”=0]
I1208 10:56:46.202221 9 backup.go:93] [2022/12/08 10:56:46.202 +00:00] [INFO] [client.go:1041] [“try backup”] [range-sn=54] [“retry time”=0]
I1208 10:56:46.202227 9 backup.go:93] [2022/12/08 10:56:46.202 +00:00] [INFO] [client.go:1041] [“try backup”] [range-sn=57] [store-id=1806129] [“retry time”=0]

I1208 10:56:46.202427 9 backup.go:93] [2022/12/08 10:56:46.202 +00:00] [WARN] [push.go:86] [“fail to connect store, skipping”] [range-sn=55] [store-id=1815001] [error=“[BR:Common:ErrFailedToConnect]failed to make connection to store 1815001: context deadline exceeded”] [errorVerbose=“[BR:Common:ErrFailedToConnect]failed to make connection to store 1815001: context deadline exceeded\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.(*Mgr).getGrpcConnLocked\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:352\ngithub.com/pingcap/tidb/br/pkg/conn.(*Mgr).GetBackupClient\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:371\ngithub.com/pingcap/tidb/br/pkg/backup.(*pushDown).pushBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/push.go:82\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:606\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571”]

The info and warn messages here keep repeating, the directory I created on s3 exists, but data cannot be written into it. Can any expert explain this?

| username: xfworld | Original post link

Check this node, 1815001, TiKV node…

| username: panqiao | Original post link

I currently have only three KV nodes, but there are four registered stores here. This is because there was a scaling down operation before, and I can’t delete this store no matter what. Please help, experts.

| username: xfworld | Original post link

You can use this

| username: yiduoyunQ | Original post link

  1. Use kubectl get tc {tc-name} -o yaml to check whether the current TC’s replica is 3 or 4.
  2. Use kubectl logs tidb-controller-manager-{xxxx} to check the operator’s output logs.
  3. Refer to TiDB 数据库的调度 | PingCAP 文档中心 for the TiKV state transition diagram, and manually delete the store using pd-ctl (normally, the operator should do this automatically, but there might be an issue here).
  4. (Optional) After the state changes to tombstone, refer to the above link and use pd-ctl store remove-tombstone to clean up the tombstone store.
  5. If you want to find the cause, you need to first confirm the information from steps 1 and 2.
| username: panqiao | Original post link

However, the status here is down, and my TiDB cluster is deployed in Kubernetes, so there is no tiup command.

| username: panqiao | Original post link

The result of execution 1, I selected the TiKV part


The result of execution 2 is basically a repeated error message

| username: panqiao | Original post link

I executed the following command:
./pd-ctl store delete 1815001

Then the store status changed to offline, and after two minutes, it changed to Tombstone. The information has been deleted from the store. I’ll try backing up to S3 again.

| username: jansu-dev | Original post link

Hello, the issue has been resolved. Don’t forget to mark it as the “Best Answer”.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.