Backup Error: Unexpected Error, Stopping to Retry

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: br 备份报错 unexcepted error, stop to retry

| username: Hacker_ojLJ8Ndr

【TiDB Version】
6.1.1
【Problem Encountered】
Backup occasionally fails
【Reproduction Path】
Backup a single database br backup db …
【Problem Phenomenon and Impact】
【Attachment】
Backup log error:


| username: tidb狂热爱好者 | Original post link

The error is obvious, the table is locked.

| username: Hacker_ojLJ8Ndr | Original post link

I see txnLockFast, but can this situation be handled by br? Or is there a parameter to adjust backoffer.maxSleep?

| username: tidb狂热爱好者 | Original post link

This should be fixable. You can back up by stopping the read-write lock.

| username: Hacker_ojLJ8Ndr | Original post link

I checked the TiKV logs for the time period when this error occurred. The tables related to the lock were not backed up under the db, but the stats_histograms and stats_meta under mysql were.

| username: Lucien-卢西恩 | Original post link

Check the TiKV logs for Lockfast to see what the main errors are. What are the specific regions and keys involved?

| username: Hacker_ojLJ8Ndr | Original post link

Here is today’s full backup log:

Searched for “Lockfast” in the tikv log but found no errors.

Searched for “lock” in the tikv log, and the error is as follows:

The decoded result of the key locked at 16:16:24 with err=“Key is locked (will clean up)” is as follows:

| username: Lucien-卢西恩 | Original post link

Check the relationship between the table with table_id=21 and BR, as well as the business logic. Is it possible to stagger the backup to reduce the instances of query conflicts?

| username: Hacker_ojLJ8Ndr | Original post link

table_id = 21 is a system table under the mysql database and is not within my backup scope. Besides the system table with id 21, there is sometimes also a system table with id 23.

| username: Hacker_ojLJ8Ndr | Original post link

Recently, errors are still being reported. I found that every time a backup fails, the backup log reports an error txnLockFast, and there are similar errors in the TiKV log: [store.rs:2665] [“broadcasting unreachable”] [unreachable_store_id=50288811] [store_id=8]. However, the TiKV service is normal. Is this related?

| username: Hacker_ojLJ8Ndr | Original post link

Related new post: BR 备份报错 txnLockFast - TiDB 的问答社区

| username: jansu-dev | Original post link

It’s best to have a clinic. Some methods are also provided in the new post, you can take a look.

| username: Hacker_ojLJ8Ndr | Original post link

I have placed the clinic link in the new post, please take a look: Clinic Service