Issues with TiDB Backup Tools

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 备份工具问题

| username: GreenGuan

I encountered the following error when using the backup tool br to back up to a self-built S3 storage:

  1. What is the cause of this problem?
  2. What is the handling logic of the br tool when encountering a 4xx problem?

TiDB version is 5.4.2

[BR:KV:ErrKVStorage]tikv storage occur I/O error\nerror happen in store 7181537 at xxx:20160: Io(Custom { kind: Other, error: \"failed to put object rusoto error Request ID: None Body: <Error><Code>InvalidPart</Code><Message>One or more of the specified parts could not be found. The part might not have been uploaded, or the specified entity tag might not have matched the part's entity tag.</Message>
| username: Billmay表妹 | Original post link

It’s somewhat similar to this question, you can take a look~

| username: GreenGuan | Original post link

It’s not always unusable; the issue occurs occasionally. I asked colleagues in the company’s S3 team, and they suspect that after the uploadPart request fails in BR, incorrect part information is passed in during the subsequent completeMultipartUpload, causing the failure. Therefore, I hope to ask the community if any users have encountered similar errors, or it would be great if the developers could briefly explain the shard upload logic of the BR tool.

| username: Billmay表妹 | Original post link

Are there any other nodes with a pending offline status?

| username: Billmay表妹 | Original post link

Here are a few similar issues with the same error, you can refer to them~

| username: GreenGuan | Original post link

Backing up the TiKV nodes of this cluster did not show any anomalies. This I/O error issue has been reported before, but it seems more likely to be a problem with either BR or S3.

| username: dba-kit | Original post link

The BR tool supports the standard S3 protocol, but other vendors or self-built S3-like clusters may have some incompatibilities. What kind of S3 are you using?

| username: GreenGuan | Original post link

Scenario: BR backup data to S3
Issue: Due to the BR tool invocation, some parts were missing during the multipart upload submission, resulting in a 40x error and causing the entire backup task to exit.
Expectation from the community: Please help locate whether there is an issue in this part of the logic, where some parts were incorrectly recorded? Is the logic for the entire backup task exit reasonable, and can a switch be added to skip this?

| username: dba-kit | Original post link

Could you please clarify what the time in this screenshot represents? Additionally, could you provide information on which technology you used to build your own S3? Is it Minio?

| username: GreenGuan | Original post link

  1. The screenshot shows the status and runtime duration of the store in pd-ctl.
  2. This is a self-developed S3. What language is the technology you mentioned developed in?
| username: GreenGuan | Original post link

I consulted with a colleague from S3, and it seems highly likely that the issue is with the br tool. Is there any update?

| username: dba-kit | Original post link

How long had BR been running when the error occurred? It seems highly likely that it’s an issue with S3. You might want to check with your colleagues who develop S3-like services to see if there’s a maximum retention time for shards.

| username: dba-kit | Original post link

Alternatively, you can set up an S3 storage using MinIO locally to test if there are any errors when backing up to MinIO.

| username: GreenGuan | Original post link

This task lasted about 2 hours from backup error to exit, with a maximum retention time of 2 days.

| username: dba-kit | Original post link

Well, that’s unclear. My backups to Alibaba Cloud OSS and Tencent Cloud COS are compatible, but the S3 protocol is essentially a protocol defined by Amazon itself and is still evolving. Various tools can only try to be as compatible as possible, but 100% compatibility cannot be guaranteed…