Br restore recovery prompt cannot find sst file

translator_bot · June 23, 2024, 9:40am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: br restore 恢复提示找不到sst文件

| username: lxzkenney

tidb 5.0.3
br is 5.0.6

The backup was successfully completed:

[2022/07/17 16:59:27.778 +08:00] [INFO] [client.go:1043] [“checksum success”] [database=sdhz_rpt] [table=#tableau_1946_sid_887200bb_4_group]
[2022/07/17 16:59:27.778 +08:00] [INFO] [client.go:1043] [“checksum success”] [database=sdhz_rpt] [table=rpt_hz_follow_info_d]
[2022/07/17 16:59:27.779 +08:00] [INFO] [client.go:1043] [“checksum success”] [database=sdhz_rpt] [table=rpt_sdb_ss_contact_trans_d]
[2022/07/17 16:59:27.818 +08:00] [INFO] [client.go:225] [“save backup meta”] [path=local:///data0/full_bak] [size=27003534]
[2022/07/17 16:59:27.837 +08:00] [INFO] [client.go:510] [“[pd] exit tso dispatcher”] [dc-location=global]
[2022/07/17 16:59:27.838 +08:00] [INFO] [client.go:510] [“[pd] exit tso dispatcher”] [dc-location=global]
[2022/07/17 16:59:27.838 +08:00] [INFO] [collector.go:67] [“Full backup success summary”] [total-ranges=3242] [ranges-succeed=3242] [ranges-failed=0] [backup-fast-checksum=937.544344ms] [backup-checksum=12m45.788892029s] [backup-total-ranges=3242] [total-take=51m13.727171783s] [BackupTS=434646625601454081] [total-kv=4765129225] [total-kv-size=1.176TB] [average-speed=382.7MB/s] [“backup data size(after compressed)”=250GB]

Backup restore command:

nohup /opt/software/tidb-toolkit-v5.0.6-linux-amd64/bin/br restore full --pd 10.100.2.55:2379 -s local:///data0/new_fullbak/full_bak --ratelimit 128 --log-file backupfull2022_recover1.log > /tmp/br_tidb_rpt2022_recover1.log 2>&1 &

During the restore, the following error occurred, which seems obvious from the error message that the backup SST file could not be downloaded:
[“br failed”] [error="No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory

Full backup data:

(Owner: tidb mod:777, Owner: root mod:777 both tried)

Using NFS, mount the shared directory to the local (4 TiKV nodes), and the above error occurred.
Copy the full backup to the local of all TiKV nodes in the cluster, and the same error occurred.

For details, see the attachment:
backupfull2022_recover1.log (1.3 MB)

translator_bot · June 23, 2024, 9:40am

| username: Cabbager | Original post link

Is /data0/new_fullbak a directory mounted on an NFS share?
Have all the TiKV/TiDB/PD instances mounted this NFS share?

translator_bot · June 23, 2024, 9:40am

| username: lxzkenney | Original post link

The /data0/full_bak mount directory.
Now the /data0/new_fullbak/full_bak directory has been copied to the local tikv (4 nodes, full backup). The problem has not been resolved.

translator_bot · June 23, 2024, 9:40am

| username: Cabbager | Original post link

Refer to this and give it a try:

TiKV and backup_endpoint nodes need to have the same backup directory, such as /home/tidb/backup_local.
Because the backup_endpoint node needs to store backupmeta.

translator_bot · June 23, 2024, 9:40am

| username: lxzkenney | Original post link

I followed the documentation requirements.

Backup machine:

Four TiKV machines:

translator_bot · June 23, 2024, 9:40am

| username: Cabbager | Original post link

Try this guide.

translator_bot · June 23, 2024, 9:40am

| username: lxzkenney | Original post link

I checked it according to this, and didn’t find any differences. Could it be that there was packet loss when I copied, causing some SST files to be incomplete during the cp process? Previously, the NFS backup was unsuccessful, so I backed up to each TiKV locally, then copied to NFS. During recovery, I used NFS.
I will back up again using the NFS method. I’ll try again after the backup is complete.

translator_bot · June 23, 2024, 9:40am

| username: Kongdom | Original post link

Are all copies to the four TiKV directories local:///data0/new_fullbak/full_bak?
What role was used for the copy? Is it possible that the operating user does not have read and write permissions for the directory?

translator_bot · June 23, 2024, 9:40am

| username: lxzkenney | Original post link

All four directories are local:///data0/new_fullbak/full_bak
The root user copied them, and then I changed the owner to the tidb user using chown tidb:tidb -R /data0/new_fullbak/full_bak

translator_bot · June 23, 2024, 9:40am

| username: banana_jian | Original post link

The files in the logs seem to be different from the ones in your screenshot. It feels like these files were not found:
11585778_302143638_206841_f5418051facea0901fd4ffdc853637e9d387f2e7813fe227f6de454f4bd68e68_1658046521336_default.sst,
11585778_302143638_206841_f5418051facea0901fd4ffdc853637e9d387f2e7813fe227f6de454f4bd68e68_1658046521336_write.sst
4_302123137_206841_c8b363e0813f2175187dd77c0a148cfbb7d4545afa519d932ba75973311aad56_1658046520346_default.sst,
4_302123137_206841_c8b363e0813f2175187dd77c0a148cfbb7d4545afa519d932ba75973311aad56_1658046520346_write.sst

translator_bot · June 23, 2024, 9:40am

| username: lxzkenney | Original post link

I only captured 5, there are 2,000 SST files.

translator_bot · June 23, 2024, 9:40am

| username: lxzkenney | Original post link

After backing up using NFS, the restore still reports the same error.
Backup successful screenshot:

Restore failed Log:
backupfull2022_recover6.log (4.4 MB)

translator_bot · June 23, 2024, 9:40am

| username: lxzkenney | Original post link

I found the reason; it was my mistake. The NFS shared directory was not mounted to the TiFlash node. It doesn’t need to be mounted during backup, but it does during restoration. No wonder it always reported “not found sst” during restoration. There was no issue during backup.

Thanks to all the experts for your help!

translator_bot · June 23, 2024, 9:40am

| username: ShawnYan | Original post link

Glad it got resolved.

translator_bot · June 23, 2024, 9:40am

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.