Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: br 备份报错 unexcepted error, stop to retry
【TiDB Version】
6.1.1
【Problem Encountered】
Backup occasionally fails
【Reproduction Path】
Backup a single database br backup db …
【Problem Phenomenon and Impact】
【Attachment】
Backup log error:
The error is obvious, the table is locked.
I see txnLockFast, but can this situation be handled by br? Or is there a parameter to adjust backoffer.maxSleep?
This should be fixable. You can back up by stopping the read-write lock.
I checked the TiKV logs for the time period when this error occurred. The tables related to the lock were not backed up under the db, but the stats_histograms and stats_meta under mysql were.
Check the TiKV logs for Lockfast to see what the main errors are. What are the specific regions and keys involved?
Here is today’s full backup log:
Searched for “Lockfast” in the tikv log but found no errors.
Searched for “lock” in the tikv log, and the error is as follows:
The decoded result of the key locked at 16:16:24 with err=“Key is locked (will clean up)” is as follows:
Check the relationship between the table with table_id=21 and BR, as well as the business logic. Is it possible to stagger the backup to reduce the instances of query conflicts?
table_id = 21 is a system table under the mysql database and is not within my backup scope. Besides the system table with id 21, there is sometimes also a system table with id 23.
Recently, errors are still being reported. I found that every time a backup fails, the backup log reports an error txnLockFast, and there are similar errors in the TiKV log: [store.rs:2665] [“broadcasting unreachable”] [unreachable_store_id=50288811] [store_id=8]. However, the TiKV service is normal. Is this related?
It’s best to have a clinic. Some methods are also provided in the new post, you can take a look.
I have placed the clinic link in the new post, please take a look: Clinic Service