Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: BACKUP 备份失败有什么原因? 表数据较大时很多时候出现失败
[TiDB Usage Environment] Production Environment / Testing / PoC
[Encountered Issue: Problem Phenomenon and Impact] When executing a BACKUP, the progress reaches 99%+, and then suddenly the progress cannot be found through SHOW BACKUPS; the backup is never successful. What is the impact of this issue? This phenomenon occurs with larger tables and databases.
Check the resource usage during the backup to see if resource exhaustion is causing the issue, and also review the relevant logs.
You can check the TiDB logs to see if there are any error messages or other issues.
Is there no backup of the corresponding error logs?
How about trying to use BR directly?
What does the log of the backup failure say?
Blind guess: resource busy, need to look at the specific error.
Did the issue occur when backing up the entire database or just the large databases and tables? The problem is unclear, and there are no logs either.
Post the logs and take a screenshot of the resource usage.
It turns out to be an experimental feature. I subconsciously thought it was BR.
According to the description here, it is estimated that the connection time to the TiDB node has timed out, or the node has reported an error. It seems that we indeed need to check the TiDB node logs.
You need to look at the specific logs.
Based on the experience with BR backups, there is a verification process after the backup is completed, which can be quite time-consuming. You can check the IO and CPU usage during the stalled period in Grafana → ****-Tikv-Details → Cluster → Mbps/CPU. If there is a significant and sustained increase, it is likely performing the verification. In that case, you might want to wait a bit longer.
Please provide the backup parameters and error logs for review.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.