BR cannot back up properly

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: br无法正常备份

| username: TiDBer_VtkBZH6I

[TiDB Usage Environment] Production Environment / Test / Poc
Test

[TiDB Version]
V5.0.1

[Reproduction Path] What operations were performed when the issue occurred
Using BR to back up to MinIO, it hung multiple times in the middle. Used kill -9 to terminate the backup process.

[Encountered Issue: Problem Phenomenon and Impact]
The backup cannot proceed normally.

[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]




| username: xfworld | Original post link

Based on the log information, it seems to be a network issue.

| username: 裤衩儿飞上天 | Original post link

First, check any of your TiKV instances and see if you can write data into your MinIO normally.

| username: TiDBer_VtkBZH6I | Original post link

It can be written in, there is data, but it is particularly full, and there are errors reported in the logs.

| username: TiDBer_VtkBZH6I | Original post link

Very full

| username: TiDBer_VtkBZH6I | Original post link

A TiKV node was previously offline and not handled. Now, we need to back up the data, and it’s particularly full, so we restarted the offline node.

| username: TiDBer_VtkBZH6I | Original post link

The image is not visible. Please provide the text you need translated.

| username: 胡杨树旁 | Original post link

Looking at the log information below, it seems to be about the primary lock? Is there an issue with the TiKV node?

| username: TiDBer_VtkBZH6I | Original post link

How to check if there is an issue with a TiKV node?

| username: TiDBer_VtkBZH6I | Original post link

Currently able to successfully back up a single table, but the speed is very slow, 15kb/s.

| username: TiDBer_VtkBZH6I | Original post link

The log information seems to be about the primary lock?

The display is normal.

| username: TiDBer_VtkBZH6I | Original post link

[2023/04/24 11:26:26.970 +08:00] [ERROR] [endpoint.rs:763] [“backup region failed”] [err_code=KV:Unknown] [err=“Io(Custom { kind: Other, error: "failed to put object Error during dispatch: error trying to connect: tcp connect error: Connection timed out (os error 110)" })”] [end_key=74800000000000AC3C5F698000000000000002FB] [start_key=74800000000000AC3C5F69800000000000000200] [region=“id: 23910263 start_key: 74800000000000ACFF3C00000000000000F8 end_key: 74800000000000ACFF3E00000000000000F8 region_epoch { conf_ver: 8 version: 15099 } peers { id: 23910264 store_id: 1 } peers { id: 23910265 store_id: 7 } peers { id: 23910266 store_id: 94001 }”]
[2023/04/24 11:26:26.970 +08:00] [ERROR] [endpoint.rs:792] [“backup failed to send response”] [err_code=KV:Unknown] [err=“TrySendError { kind: Disconnected }”]

| username: TiDBer_VtkBZH6I | Original post link

Backup can now proceed normally. Previously, due to an IP conflict, the cluster service linked to a specific TiKV node, causing the backup to be extremely slow. To restore normal cluster operations, ARP was set within the cluster, and the cluster returned to normal. However, the backup remained slow and reported errors. Single table backups were successful, but the network speed was only 15kb/s. Considering the poor network speed, it was suspected to be an issue with MinIO. It was later realized that while ARP was set for the cluster, it was not set for MinIO, causing MinIO to be unable to recognize TiKV. After setting ARP for the MinIO server, the backup proceeded normally.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.