Database cannot be restored - very urgent, please help

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 数据库恢复不了——好急,求助。

| username: TiDBer_4CqIpEXU

[TiDB Usage Environment] Poc
[TiDB Version] 7.5
[Reproduction Path]
[Encountered Issue:
In our deployed test environment, we imported about 1.5T of data, which was about 500G after backup. When restoring from Alibaba Cloud OSS to the POC environment, some smaller databases can be restored, but some databases with hundreds of gigabytes show no progress during restoration and remain in a restoring state. The backend shows that some tables are restored but without data, and some tables are not even created, appearing to be in a static state with no response.]

The nodes in the test environment are different from those in the POC environment. Data is backed up from the test environment and restored to the POC environment. The POC environment consists of: 3 PD, 4 TiDB, 5 KV.

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
tikv.log.tail3000.grep-c-30-sst.log (417.5 KB)
tikv.log.tail3000.log (852.4 KB)
show.txt (33.1 KB)

| username: zhaokede | Original post link

No logs?

| username: xfworld | Original post link

What method did you use for the backup? Have you confirmed that the backup was successful?

| username: 不想干活 | Original post link

I didn’t see the attachment, please upload it again.

| username: TiDBer_4CqIpEXU | Original post link

Uploaded the latest logs from one of the nodes, please take a look.

| username: TiDBer_4CqIpEXU | Original post link

Uploaded the latest logs of one of the nodes, please take a look, teacher.

| username: TiDBer_4CqIpEXU | Original post link

The BACKUP DATABASE executed in SQL, confirmed that the backup was successful.

| username: tony5413 | Original post link

I also encountered a situation where it couldn’t be restored and was stuck. There were no useful errors in the logs. Later, I found that the disk usage of TiKV had exceeded 80%. After cleaning up the TiKV disk, it could be restored.

| username: TiDBer_4CqIpEXU | Original post link

Thanks, I’ll check it out.

| username: TiDBer_4CqIpEXU | Original post link

The space is sufficient. There is a database that is over 200MB, but it has more than 100,000 tables and cannot be restored.

| username: TIDB-Learner | Original post link

How long did it take to restore the data, or was there no data from the beginning to the end?

| username: TiDBer_q2eTrp5h | Original post link

Take a look at the logs and post them here for everyone to review.

| username: 小于同学 | Original post link

Where are the logs?

| username: TiDBer_jYQINSnf | Original post link

It looks like the DDL execution is stuck in your case. In such situations, you can use admin show ddl to check who the DDL leader is, and then look at the logs of the corresponding TiDB node.

Additionally, when creating tables, if split-table is set to true, it means that each table will undergo a region split. Since your cluster needs tens of thousands of tables, this could be a potential issue. You might want to try turning it off first.

| username: WinterLiu | Original post link

The person above is right. I learned something.

| username: 小龙虾爱大龙虾 | Original post link

After reading for a long time, I still don’t know whether you are using BR to restore or Lightning local mode to restore. From the third log, it seems that the DDL is stuck. Try restarting the TiDB server.

| username: Jack-li | Original post link

Has it been restored?

| username: zhh_912 | Original post link

First, check if the backup was successful. Secondly, check the method used, whether the restore command is correct, and whether the file permissions are correct.

| username: 随缘天空 | Original post link

Use the command admin show ddl to check if the table creation statement is stuck. At the same time, ensure that the server has sufficient resources during the recovery process to prevent resource exhaustion from affecting the process.

| username: jiayou64 | Original post link

Learning~~~