[TiDB Usage Environment] Poc
[TiDB Version] 7.5
[Reproduction Path]
[Encountered Issue:
In our deployed test environment, we imported about 1.5T of data, which was about 500G after backup. When restoring from Alibaba Cloud OSS to the POC environment, some smaller databases can be restored, but some databases with hundreds of gigabytes show no progress during restoration and remain in a restoring state. The backend shows that some tables are restored but without data, and some tables are not even created, appearing to be in a static state with no response.]

The nodes in the test environment are different from those in the POC environment. Data is backed up from the test environment and restored to the POC environment. The POC environment consists of: 3 PD, 4 TiDB, 5 KV.

What method did you use for the backup? Have you confirmed that the backup was successful?

The BACKUP DATABASE executed in SQL, confirmed that the backup was successful.

I also encountered a situation where it couldn’t be restored and was stuck. There were no useful errors in the logs. Later, I found that the disk usage of TiKV had exceeded 80%. After cleaning up the TiKV disk, it could be restored.

The space is sufficient. There is a database that is over 200MB, but it has more than 100,000 tables and cannot be restored.

How long did it take to restore the data, or was there no data from the beginning to the end?

Take a look at the logs and post them here for everyone to review.

It looks like the DDL execution is stuck in your case. In such situations, you can use admin show ddl to check who the DDL leader is, and then look at the logs of the corresponding TiDB node.

Additionally, when creating tables, if split-table is set to true, it means that each table will undergo a region split. Since your cluster needs tens of thousands of tables, this could be a potential issue. You might want to try turning it off first.

After reading for a long time, I still don’t know whether you are using BR to restore or Lightning local mode to restore. From the third log, it seems that the DDL is stuck. Try restarting the TiDB server.

First, check if the backup was successful. Secondly, check the method used, whether the restore command is correct, and whether the file permissions are correct.

Use the command admin show ddl to check if the table creation statement is stuck. At the same time, ensure that the server has sufficient resources during the recovery process to prevent resource exhaustion from affecting the process.

