Error in Restoring After TiDB Backup

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 备份后恢复错误

| username: myzz

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.0

Here’s the situation: I mounted S3 as a file system on each machine in the cluster. Let’s take the /mnt/tidb/ directory as an example.

On the control machine, I used the command tiup bench tpcc -H {tidb} -P 4000 -D tpcc -U root --password '{password}' --warehouses 100 --threads 64 prepare to prepare the data.

After that, I executed the command tiup br backup full -s local:///mnt/tidb/2023-04-12-15-45 --pd {pd-server}:2379 --log-file backup.log on the control machine to perform the backup. After the backup was completed, I used tiup bench tpcc cleanup to clear the data.

Later, when I used tiup br restore full -s local:///mnt/tidb/2023-04-12-15-45 --pd {pd-server}:2379 --log-file backup.log to restore, I encountered errors like “download sst failed” and “No such file or directory.”

Is this backup method feasible? Are there any other issues that need to be resolved?

| username: 大鱼海棠 | Original post link

I don’t quite understand the principle of S3. Can you confirm whether all TiKV nodes can obtain the complete list of backup files during the restore process?

| username: Billmay表妹 | Original post link

There may be several aspects to pay attention to:

  1. Backup and restore paths: In your backup and restore commands, you are using the local file system path /mnt/tidb/2023-04-12-15-45 instead of the S3 path. This may cause the backup and restore to fail because the backup and restore commands default to using S3 storage. If you want to use the local file system for backup and restore, you need to specify the -s local parameter in the command, for example:
tiup br backup full -s local:///mnt/tidb/2023-04-12-15-45 --pd {pd-server}:2379 --log-file backup.log
tiup br restore full -s local:///mnt/tidb/2023-04-12-15-45 --pd {pd-server}:2379 --log-file backup.log
  1. Backup and restore permissions: Since you are using the local file system path for backup and restore, you need to ensure that the TiDB and TiKV processes have sufficient permissions to read and write these files. You can try using the chmod command to modify the file permissions, for example:
chmod -R 777 /mnt/tidb/2023-04-12-15-45
  1. Data consistency during backup and restore: During your backup and restore process, you used the tiup bench tpcc command to prepare data and the tiup bench tpcc cleanup command to clear data. This may cause data inconsistency in the backup and restore, leading to restore failure. To avoid this situation, it is recommended to stop the TiDB cluster before the backup and clear the data in the TiDB cluster before the restore.
| username: myzz | Original post link

Now, when using S3 for backup, this error occurs again:
{ kind: Other, error: “failed to put object rusoto error Request ID: None Body: \n\t-148668\n\tvalidate initidx and partlist [tideswing-tidb-backup:3af260f3-c3d9-43c5-a0c0-88c5afd04e97] failed, err: inconsistency filesize\n\t54e9c553-da2e-4412-9534-224f4e975565\n” }): [BR:KV:ErrKVStorage]tikv storage occur I/O error

| username: myzz | Original post link

I checked all the nodes and could access the directory mounted on S3 and see the backup file contents, but during the restoration, it says that the SST file is missing.

| username: myzz | Original post link

I first used the tiup bench tpcc command to prepare the data, then performed a backup. After the backup was completed, I used clean up to clear the data. When I tried to restore directly afterward, it showed “SST File Not found.” Is this related to data preparation? Or do I need to prepare a new cluster for the restoration?

| username: 大鱼海棠 | Original post link

Each node must be able to see the full SST backup files.