Backup Failure

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: br 备份失败

| username: 兜不靠谱

【TiDB Usage Environment】Production Environment
【TiDB Version】5.4.0
【Reproduction Path】It used to be successful all the time. With business growth, it started to fail intermittently, and eventually, it failed consistently.
【Encountered Problem: Phenomenon and Impact】
Executing command:
br backup full
–pd “10.134.171.16:2379,10.134.169.224:2379,10.134.171.96:2379”
–storage "s3://aa-cn-aip-tidb-backup-1306458289/${timetag}full"
–s3.endpoint “http://cos.xxx.xxx.xxx.com
–s3.region “xxxx-xxxxxxx”
–send-credentials-to-tikv=true
–ratelimit 128
–log-file /tmp/backupfull
$timetag.log
Log screenshot

【Resource Configuration】
3 tidb 48c 192G
3 pd 32c 64g
8 tikv 48c 192g
2 ticdc 32c 64g
1 monitor 8c 16g
Attached is the log
backupfull_2023-08-02-02-30.log (77.4 MB)

| username: tidb菜鸟一只 | Original post link

Were there any errors reported by TiKV during the same time period? Check what the errors are.

| username: 兜不靠谱 | Original post link

TiKV did not receive any alerts. The backup was done at night, and there were no anomalies in the cluster status when checked in the morning.

| username: tidb菜鸟一只 | Original post link

It seems that your backup failed due to the “Region is unavailable” error. You can follow the steps below to troubleshoot:

| username: 兜不靠谱 | Original post link

When checking the TiKV logs corresponding to the backup error period, I found the following, not sure if it helps:




The continuous warnings might also be abnormal.

Continuing epoch not match

The following image shows errors that occur frequently even without backups.


The rest are just info logs without obvious anomalies.
Also, there are no occurrences of “busy,” “oom,” or “memory” throughout the logs.

| username: 兜不靠谱 | Original post link

Add some diagnostic information.
No OOM found in var/log/messages.
Checked the region and store status with pdctl, no anomalies found.

| username: 像风一样的男子 | Original post link

The region is damaged.
Check out this article: 专栏 - 记一次sst文件损坏修复过程 | TiDB 社区

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.