What are the reasons for BACKUP failure? Backup often fails when table data is large

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: BACKUP 备份失败有什么原因? 表数据较大时很多时候出现失败

| username: TiDBer_CQ

[TiDB Usage Environment] Production Environment / Testing / PoC
[Encountered Issue: Problem Phenomenon and Impact] When executing a BACKUP, the progress reaches 99%+, and then suddenly the progress cannot be found through SHOW BACKUPS; the backup is never successful. What is the impact of this issue? This phenomenon occurs with larger tables and databases.

| username: dba远航 | Original post link

Check the resource usage during the backup to see if resource exhaustion is causing the issue, and also review the relevant logs.

| username: WalterWj | Original post link

No errors reported?

| username: Miracle | Original post link

You can check the TiDB logs to see if there are any error messages or other issues.

| username: Kongdom | Original post link

Is there no backup of the corresponding error logs?

| username: DBAER | Original post link

How about trying to use BR directly?

| username: zhanggame1 | Original post link

What does the log of the backup failure say?

| username: WinterLiu | Original post link

Blind guess: resource busy, need to look at the specific error.

| username: 连连看db | Original post link

Did the issue occur when backing up the entire database or just the large databases and tables? The problem is unclear, and there are no logs either. :man_shrugging:

| username: lemonade010 | Original post link

Post the logs and take a screenshot of the resource usage.

| username: Kongdom | Original post link

:joy: It turns out to be an experimental feature. I subconsciously thought it was BR.

According to the description here, it is estimated that the connection time to the TiDB node has timed out, or the node has reported an error. It seems that we indeed need to check the TiDB node logs.

| username: 小于同学 | Original post link

Do you have logs?

| username: TiDBer_ivan0927 | Original post link

You need to look at the specific logs.

| username: porpoiselxj | Original post link

Based on the experience with BR backups, there is a verification process after the backup is completed, which can be quite time-consuming. You can check the IO and CPU usage during the stalled period in Grafana → ****-Tikv-Details → Cluster → Mbps/CPU. If there is a significant and sustained increase, it is likely performing the verification. In that case, you might want to wait a bit longer.

| username: xmlianfeng | Original post link

Please provide the backup parameters and error logs for review.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.