Failed to Create Backup Using BackupSchedule Object in TiDB 7.1.0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 7.1.0版本通过backupschdule对象创建备份失败

| username: liyuntang

[TiDB Usage Environment] Test
[TiDB Version] 7.1.0
[Reproduction Path] Backup TiDB by creating bks object
[Encountered Issues: Problem Phenomenon and Impact]
Issues:

  1. Backup fails when bks calls br, but manual br call succeeds.
  2. The libresolv.so.2 library file already exists in the container, why does bks report that it cannot find this file during backup?

br backup error log:
[root@vm10-0-2-6 ~]# kubectl logs backup-4494cccb-9cda-4a85-805f-9f62b78dec62-vzdvg -n $ns
Create rclone.conf file.
/tidb-backup-manager backup --namespace=b0b996d0-4c67-4c13-be35-c712c7162c05 --backupName=4494cccb-9cda-4a85-805f-9f62b78dec62 --tikvVersion=v7.1.0
I0705 10:58:26.075002 9 backup.go:72] start to process backup b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62
I0705 10:58:27.651814 9 backup_status_updater.go:86] Backup: [b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62] updated successfully
E0705 10:58:27.654504 9 backup_status_updater.go:89] Failed to update backup [b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62], error: Operation cannot be fulfilled on backups.pingcap.com “4494cccb-9cda-4a85-805f-9f62b78dec62”: the object has been modified; please apply your changes to the latest version and try again
I0705 10:58:27.686458 9 backup_status_updater.go:86] Backup: [b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62] updated successfully
I0705 10:58:27.686491 9 backup.go:69] Running br command with args: [backup full --pd=basic-pd.b0b996d0-4c67-4c13-be35-c712c7162c05:2379 --send-credentials-to-tikv=true --storage=s3://tidb-ks3-bucket/2000104981/b0b996d0-4c67-4c13-be35-c712c7162c05/manual/4494cccb-9cda-4a85-805f-9f62b78dec62 --s3.region=BEIJING --s3.provider=other --s3.endpoint=http://ks3-cn-beijing-internal.ksyun.com]
I0705 10:58:27.687946 9 backup.go:93]
I0705 10:58:27.687971 9 backup.go:100] Error loading shared library libresolv.so.2: No such file or directory (needed by /var/lib/br-bin/br)
E0705 10:58:27.688042 9 manager.go:293] backup cluster b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62 data failed, err: cluster b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62, wait pipe message failed, errMsg Error loading shared library libresolv.so.2: No such file or directory (needed by /var/lib/br-bin/br)
, err: exit status 127
E0705 10:58:27.697283 9 backup_status_updater.go:89] Failed to update backup [b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62], error: Operation cannot be fulfilled on backups.pingcap.com “4494cccb-9cda-4a85-805f-9f62b78dec62”: the object has been modified; please apply your changes to the latest version and try again
I0705 10:58:27.713958 9 backup_status_updater.go:86] Backup: [b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62] updated successfully
error: cluster b0b996d0-4c67-4c13-be35-c712c7162c05/4494cccb-9cda-4a85-805f-9f62b78dec62, wait pipe message failed, errMsg Error loading shared library libresolv.so.2: No such file or directory (needed by /var/lib/br-bin/br)
, err: exit status 127

[Resource Configuration]
tidb-operator version is 1.3.3
tidb, pd, tikv configuration: 8 cores, 20G

[Attachments: Screenshots/Logs/Monitoring]


bks configuration:
apiVersion: v1
items:

  • apiVersion: pingcap.com/v1alpha1
    kind: BackupSchedule
    metadata:
    creationTimestamp: “2023-07-03T02:39:23Z”
    generation: 19
    name: basic-backup-schedule
    namespace: b0b996d0-4c67-4c13-be35-c712c7162c05
    resourceVersion: “527849164”
    selfLink: /apis/pingcap.com/v1alpha1/namespaces/b0b996d0-4c67-4c13-be35-c712c7162c05/backupschedules/basic-backup-schedule
    uid: d0446128-18aa-4f58-8e99-ef5e0b1326b1
    spec:
    backupTemplate:
    affinity:
    nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
    - key: failure-domain.beta.kubernetes.io/zone
    operator: In
    values:
    - cn-qingyangtest-1a
    backupType: full
    br:
    cluster: basic
    clusterNamespace: b0b996d0-4c67-4c13-be35-c712c7162c05
    sendCredToTikv: true
    cleanPolicy: Retain
    resources: {}
    s3:
    bucket: tidb-ks3
    endpoint: http://ks3-cn.com
    prefix: 2000104981/b0b996d0-4c67-
    provider: other
    region: BEIJING
    secretName: ks3-secret
    imagePullSecrets:
    • name: nosql-image
      maxBackups: 0
      schedule: 58 10 * * *
      storageClassName: openebs-hostpath
      status:
      lastBackup: 4494cccb-9cda-4a85-805f-9f62b78dec62
      lastBackupTime: “2023-07-05T02:58:00Z”
      kind: List
      metadata:
      resourceVersion: “”
      selfLink: “”
| username: tidb狂热爱好者 | Original post link

The issue with your operating system is that installing Kubernetes on Ubuntu 22.04 works fine, but when you switch the underlying pod to CentOS 8, it doesn’t.

| username: liyuntang | Original post link

You mean the k8s node system needs to be changed to CentOS 8, right?

| username: liyuntang | Original post link

I don’t quite understand this part. Will the pod’s operation be affected by the node’s system?

| username: yiduoyunQ | Original post link

Was the previous lower version good?

| username: liyuntang | Original post link

Versions 4.0.12, 5.4.1, and 6.5.0 do not have this issue.

| username: liyuntang | Original post link

Actually, the confusing part of this issue is that manual BR works, but it doesn’t work when called through the backup object.

| username: yiduoyunQ | Original post link

Reason: There is currently a compatibility issue between the Alpine container image used for backups and the new version of Go. Starting from v7.0, a new version of Go is used for compilation, and there is a related issue with Go: Error loading shared library libresolv.so.2 on Alpine in Go 1.20 · Issue #59305 · golang/go · GitHub
Solution: Wait for the official fix for the container image.
Workaround: Manually back up using the local br method :rofl:

| username: liyuntang | Original post link

Got it, thanks!

| username: yiduoyunQ | Original post link

Operator v1.5.0-beta.1 has fixed this issue, and the corresponding fix will be merged into the latest v1.4 release later. However, it is unlikely to be merged into v1.3. It is recommended that TiDB Cluster v6.5 and above use Operator v1.4 and above. Refer to the documentation at TiDB Operator 简介 | PingCAP 文档中心

| username: liyuntang | Original post link

Understood, upgrading the operator is already in the plan. We will first upgrade to version 6.5.3 and then proceed with the upgrade.

Additionally, I have another question:
Which version of BR does the BR program in the br:v6.5.3 image correspond to?

| username: yiduoyunQ | Original post link

Refer to .spec.toolImage 备份与恢复 CR 介绍 | PingCAP 文档中心

| username: liyuntang | Original post link

That’s not what I meant. I now want to compile a new version based on the br version corresponding to the pingcap/br:v6.5.3 image, merging this PR: br: skip automatically get bucket region with other s3 compatible provider. by 3pointer · Pull Request #41889 · pingcap/tidb · GitHub.

I want to know the br version corresponding to the current pingcap/br:v6.5.3 image.

| username: yiduoyunQ | Original post link

The BR version corresponding to the pingcap/br:v6.5.3 image is br:v6.5.3.

| username: redgame | Original post link

find / -name libresolv.so.2

| username: cassblanca | Original post link

Careful study brings surprises.