BR Restore Failed: How to Delete All Restored Data and Restore Again

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: BR恢复失败,怎么删除所有恢复的数据,重新恢复呢

| username: 舞动梦灵

How to delete all restored data and restore again after BR restore fails?

| username: 像风一样的男子 | Original post link

Why don’t you just delete the restored database tables?

| username: 舞动梦灵 | Original post link

Just directly drop all the restored database tables, right? Then will the local space be cleaned up during its GC time?

| username: 像风一样的男子 | Original post link

After GC, it will be automatically cleaned up.

| username: 舞动梦灵 | Original post link

Bro, can you help me check this issue? It indicates a restore failure. Can I find that one failed restore? Can I find this one and restore it separately? This way, I don’t have to restore everything. Restoring everything takes 10 hours. It’s too troublesome. :joy:
[2023/12/29 03:12:54.730 +08:00] [INFO] [collector.go:188] [“Full restore Failed summary : total restore files: 99378, total success: 99377, total failed: 1”] [“split region”=28m54.978152613s] [“restore ranges”=88703] [Size=2279671975326] [unitName=file] [error=“rpc error: code = Unavailable desc = transport is closing”] [errorVerbose=“rpc error: code = Unavailable desc = transport is closing\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).ingestSST\n\tgithub.com/pingcap/br@/pkg/restore/import.go:480\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).Import.func1\n\tgithub.com/pingcap/br@/pkg/restore/import.go:276\ngithub.com/pingcap/br/pkg/utils.WithRetry\n\tgithub.com/pingcap/br@/pkg/utils/retry.go:46\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).Import\n\tgithub.com/pingcap/br@/pkg/restore/import.go:222\ngithub.com/pingcap/br/pkg/restore.(*Client).RestoreFiles.func2\n\tgithub.com/pingcap/br@/pkg/restore/client.go:584\ngithub.com/pingcap/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\tgithub.com/pingcap/br@/pkg/utils/worker.go:63\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357”]
[2023/12/29 03:12:54.730 +08:00] [ERROR] [restore.go:35] [“failed to restore”] [error=“rpc error: code = Unavailable desc = transport is closing”] [errorVerbose=“rpc error: code = Unavailable desc = transport is closing\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).ingestSST\n\tgithub.com/pingcap/br@/pkg/restore/import.go:480\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).Import.func1\n\tgithub.com/pingcap/br@/pkg/restore/import.go:276\ngithub.com/pingcap/br/pkg/utils.WithRetry\n\tgithub.com/pingcap/br@/pkg/utils/retry.go:46\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).Import\n\tgithub.com/pingcap/br@/pkg/restore/import.go:222\ngithub.com/pingcap/br/pkg/restore.(*Client).RestoreFiles.func2\n\tgithub.com/pingcap/br@/pkg/restore/client.go:584\ngithub.com/pingcap/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\tgithub.com/pingcap/br@/pkg/utils/worker.go:63\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357”] [stack=“main.runRestoreCommand\n\tgithub.com/pingcap/br@/cmd/br/restore.go:35\nmain.newFullRestoreCommand.func1\n\tgithub.com/pingcap/br@/cmd/br/restore.go:120\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:56\nruntime.main\n\truntime/proc.go:203”]
[2023/12/29 03:12:54.730 +08:00] [ERROR] [main.go:58] [“br failed”] [error=“rpc error: code = Unavailable desc = transport is closing”] [errorVerbose=“rpc error: code = Unavailable desc = transport is closing\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).ingestSST\n\tgithub.com/pingcap/br@/pkg/restore/import.go:480\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).Import.func1\n\tgithub.com/pingcap/br@/pkg/restore/import.go:276\ngithub.com/pingcap/br/pkg/utils.WithRetry\n\tgithub.com/pingcap/br@/pkg/utils/retry.go:46\ngithub.com/pingcap/br/pkg/restore.(*FileImporter).Import\n\tgithub.com/pingcap/br@/pkg/restore/import.go:222\ngithub.com/pingcap/br/pkg/restore.(*Client).RestoreFiles.func2\n\tgithub.com/pingcap/br@/pkg/restore/client.go:584\ngithub.com/pingcap/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\tgithub.com/pingcap/br@/pkg/utils/worker.go:63\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1357”] [stack=“main.main\n\tgithub.com/pingcap/br@/cmd/br/main.go:58\nruntime.main\n\truntime/proc.go:203”]
(END)

| username: wangccsy | Original post link

Try executing the recovery again.

| username: 像风一样的男子 | Original post link

That’s really unfortunate. It’s hard to tell which one has the issue. How about backing them up separately and restoring them one by one?

| username: dba远航 | Original post link

The question is which ones have already been restored.

| username: Jellybean | Original post link

Yes, directly drop all the restored database tables. After deletion, you can re-execute BR restore for data recovery.

The local space will be automatically cleaned and released after the system’s GC.

| username: Jellybean | Original post link

If you know which table failed to restore, BR Restore supports restoring data for a specific table. I recall that you just need to add the appropriate filter rules during the restoration process. You can refer to the official documentation for detailed instructions.

| username: Kongdom | Original post link

First, deleting the database allows you to perform a recovery again.

Then, if you want to improve the recovery speed, you can refer to

| username: 随缘天空 | Original post link

BR is not a full restore? Can’t you just delete the original data and start over?

| username: 舞动梦灵 | Original post link

Which table failed to restore? How can I check this and where can I find relevant information? It only gives a “fail” message, and I don’t know if it’s a table, a region, or a local file. I want to restore just this one, since there are so many, and only one failed.

| username: 舞动梦灵 | Original post link

It prompted an error, and the official documentation says it’s because the performance is too low, so reduce the speed. :joy:

| username: Jellybean | Original post link

Check the failed information to see if there is any schema-related information.

| username: 舞动梦灵 | Original post link

The information I posted is all there is. I can’t see any related information.

| username: 烂番薯0 | Original post link

Drop it?

| username: Kongdom | Original post link

:thinking: If the hardware performance is not up to standard, you can only try reducing the speed~

| username: 舞动梦灵 | Original post link

Is reducing the speed --retelimit? I saw this in the official documentation. I have already set it to --ratelimit 70 to limit it to 70m/s, but why do I still see the TiKV server write speed exceeding 100?

| username: Kongdom | Original post link

Check the spelling, and also, is TiKV a dedicated server? The rate limit here should be for each TiKV node, not for the entire TiKV or the server it resides on.

The --ratelimit option limits the speed at which each TiKV executes recovery tasks (in MiB/s).