Br restore recovery prompt cannot find sst file

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: br restore 恢复提示找不到sst文件

| username: lxzkenney

tidb 5.0.3
br is 5.0.6

The backup was successfully completed:

[2022/07/17 16:59:27.778 +08:00] [INFO] [client.go:1043] [“checksum success”] [database=sdhz_rpt] [table=#tableau_1946_sid_887200bb_4_group]
[2022/07/17 16:59:27.778 +08:00] [INFO] [client.go:1043] [“checksum success”] [database=sdhz_rpt] [table=rpt_hz_follow_info_d]
[2022/07/17 16:59:27.779 +08:00] [INFO] [client.go:1043] [“checksum success”] [database=sdhz_rpt] [table=rpt_sdb_ss_contact_trans_d]
[2022/07/17 16:59:27.818 +08:00] [INFO] [client.go:225] [“save backup meta”] [path=local:///data0/full_bak] [size=27003534]
[2022/07/17 16:59:27.837 +08:00] [INFO] [client.go:510] [“[pd] exit tso dispatcher”] [dc-location=global]
[2022/07/17 16:59:27.838 +08:00] [INFO] [client.go:510] [“[pd] exit tso dispatcher”] [dc-location=global]
[2022/07/17 16:59:27.838 +08:00] [INFO] [collector.go:67] [“Full backup success summary”] [total-ranges=3242] [ranges-succeed=3242] [ranges-failed=0] [backup-fast-checksum=937.544344ms] [backup-checksum=12m45.788892029s] [backup-total-ranges=3242] [total-take=51m13.727171783s] [BackupTS=434646625601454081] [total-kv=4765129225] [total-kv-size=1.176TB] [average-speed=382.7MB/s] [“backup data size(after compressed)”=250GB]

Backup restore command:

nohup /opt/software/tidb-toolkit-v5.0.6-linux-amd64/bin/br restore full --pd 10.100.2.55:2379 -s local:///data0/new_fullbak/full_bak --ratelimit 128 --log-file backupfull2022_recover1.log > /tmp/br_tidb_rpt2022_recover1.log 2>&1 &

During the restore, the following error occurred, which seems obvious from the error message that the backup SST file could not be downloaded:
[“br failed”] [error="No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory

Full backup data:

(Owner: tidb mod:777, Owner: root mod:777 both tried)

  1. Using NFS, mount the shared directory to the local (4 TiKV nodes), and the above error occurred.
  2. Copy the full backup to the local of all TiKV nodes in the cluster, and the same error occurred.

For details, see the attachment:
backupfull2022_recover1.log (1.3 MB)

| username: Cabbager | Original post link

Is /data0/new_fullbak a directory mounted on an NFS share?
Have all the TiKV/TiDB/PD instances mounted this NFS share?

| username: lxzkenney | Original post link

The /data0/full_bak mount directory.
Now the /data0/new_fullbak/full_bak directory has been copied to the local tikv (4 nodes, full backup). The problem has not been resolved.

| username: Cabbager | Original post link

Refer to this and give it a try:

  • TiKV and backup_endpoint nodes need to have the same backup directory, such as /home/tidb/backup_local.
    Because the backup_endpoint node needs to store backupmeta.
| username: lxzkenney | Original post link

I followed the documentation requirements.

Backup machine:

Four TiKV machines:




| username: Cabbager | Original post link

Try this guide.

| username: lxzkenney | Original post link

I checked it according to this, and didn’t find any differences. Could it be that there was packet loss when I copied, causing some SST files to be incomplete during the cp process? Previously, the NFS backup was unsuccessful, so I backed up to each TiKV locally, then copied to NFS. During recovery, I used NFS.
I will back up again using the NFS method. I’ll try again after the backup is complete.

| username: Kongdom | Original post link

  1. Are all copies to the four TiKV directories local:///data0/new_fullbak/full_bak?
  2. What role was used for the copy? Is it possible that the operating user does not have read and write permissions for the directory?
| username: lxzkenney | Original post link

  1. All four directories are local:///data0/new_fullbak/full_bak
  2. The root user copied them, and then I changed the owner to the tidb user using chown tidb:tidb -R /data0/new_fullbak/full_bak
| username: banana_jian | Original post link

The files in the logs seem to be different from the ones in your screenshot. It feels like these files were not found:
11585778_302143638_206841_f5418051facea0901fd4ffdc853637e9d387f2e7813fe227f6de454f4bd68e68_1658046521336_default.sst,
11585778_302143638_206841_f5418051facea0901fd4ffdc853637e9d387f2e7813fe227f6de454f4bd68e68_1658046521336_write.sst
4_302123137_206841_c8b363e0813f2175187dd77c0a148cfbb7d4545afa519d932ba75973311aad56_1658046520346_default.sst,
4_302123137_206841_c8b363e0813f2175187dd77c0a148cfbb7d4545afa519d932ba75973311aad56_1658046520346_write.sst

| username: lxzkenney | Original post link

I only captured 5, there are 2,000 SST files.

| username: lxzkenney | Original post link

After backing up using NFS, the restore still reports the same error.
Backup successful screenshot:

Restore failed Log:
backupfull2022_recover6.log (4.4 MB)

| username: lxzkenney | Original post link

I found the reason; it was my mistake. The NFS shared directory was not mounted to the TiFlash node. It doesn’t need to be mounted during backup, but it does during restoration. No wonder it always reported “not found sst” during restoration. There was no issue during backup.

Thanks to all the experts for your help! :pray::pray::pray:

| username: ShawnYan | Original post link

:+1: Glad it got resolved.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.