BR Restore Failed, Constantly Reporting Error: rpc error: code = Unavailable desc = transport is closing

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: BR恢复失败,一直报错Error: rpc error: code = Unavailable desc = transport is closing

| username: 舞动梦灵

BR4.0.9 backup, restored to version 5.0.0, the 200G test database was restored without any issues, but restoring this 2T one keeps failing.

Using br for the first restore. Failed once, then deleted the entire database and restored again, encountering this error. The official documentation states it’s due to insufficient resources, suggesting lowering concurrency and speed. I’ve been adjusting these two parameters downwards, but it doesn’t seem to help. The lower I set them, the faster the error occurs.

The restore command used:

tiup br restore full --pd "192.168.100.30:2379" -s "local:///mnt/tidb1bak" --ratelimit 70 --concurrency 4 --log-file /mnt/tidb1resotryfull2.log

I’ve already restored 3 times. Lowering concurrency to match the number of CPU cores still doesn’t work.

| username: wangccsy | Original post link

Did BR also start a transaction?

| username: 舞动梦灵 | Original post link

Where do you start the transaction?? I am executing the recovery command inside the screen. Which one are you referring to?

| username: 舞动梦灵 | Original post link

Is anyone there?? Does anyone know how to solve this error in the reply?
Error: rpc error: code = Unavailable desc = transport is closing

| username: 像风一样的男子 | Original post link

How about trying to back up less data?

| username: 舞动梦灵 | Original post link

I had no problem restoring a 200G backup of the test database, but I’m having issues with this 2T restore. That’s a disaster, especially since there’s a 17T backup still being transferred that will also need to be restored. My server configuration is as follows:

Memory Cores /server
tidb1 16 4 100g
tikv1-3 12 4 2900g
| username: 连连看db | Original post link

Either set concurrency to 1, or remove ratelimit. There is a bug with ratelimit in BR versions below 5.0.

| username: 舞动梦灵 | Original post link

If the ratelimit is removed, does it default to the maximum value?
Then I’ll try two? One without rate limiting, and one with concurrency set to 1, right?
tiup br restore full --pd “192.168.100.30:2379” -s “local:///mnt/tidb1bak” --log-file /mnt/tidb1resotryfull2.log
tiup br restore full --pd “192.168.100.30:2379” -s “local:///mnt/tidb1bak” CONCURRENCY 1 --log-file /mnt/tidb1resotryfull2.log

| username: 连连看db | Original post link

  • Do not enable ratelimit
  • Remove both

First, try to see if the recovery process can proceed, then consider the rate limiting issue.

| username: 舞动梦灵 | Original post link

Okay, I’ll give it a try. Previously, the first time I set the ratelimit to 128, it seemed to have successfully restored. A 2.1T backup restored 2.4T of data on the node, but it ended with an error.

| username: 路在何chu | Original post link

For cross-version, you should back up and restore according to different databases.

| username: 舞动梦灵 | Original post link

Backup and restore different databases? What does that mean? Backup a single database separately? Restore by database?

| username: 路在何chu | Original post link

Backing up multiple databases is also possible.

| username: 舞动梦灵 | Original post link

I wanted to do the same. It turns out that out of a dozen databases, about 95% are in one database. One database of 2.1T is 2T, the other ten are 100G, and another database of 17T is 14T, with the other ten databases taking up 3T. :joy:

| username: 路在何chu | Original post link

Your database must have many large tables. Directly back up those large tables separately, and then back up the remaining tables together.

| username: 路在何chu | Original post link

Use -f to specify the database name and table name for backup.

| username: 舞动梦灵 | Original post link

It doesn’t seem to work:
No parameters are added, and then it prompts that a TiKV node is disconnected and the backup directly reports an error.


| username: 舞动梦灵 | Original post link

There isn’t enough time now :joy: This project is planned to be completed by the end of January. The transfer speed is only 4MB per second, and it’s all small files, so the speed can’t be increased, making the transfer time very slow. If we go and handle each table individually, it will probably take even longer. Unless we have no other choice in the end.

| username: 像风一样的男子 | Original post link

Your transfer speed is slower than using an external hard drive.

| username: 舞动梦灵 | Original post link

I want to do that too, transferring from Alibaba Cloud to the local data center, which is in the office. :joy: I asked the operations team, and they said it can’t be faster. There are too many small files.