Issue with Backup Failure: [pd] failed updateMember

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: br 备份失败问题 [pd] failed updateMember

| username: GreenGuan

In cluster version v5.1.1, when backing up to S3 storage, an error [pd] failed updateMember is reported first during the backup, and then [BR:KV:ErrKVStorage]tikv storage occur I/O error is reported after a while. I have checked the cluster status, and both PD and TiKV are normal. Has anyone encountered this issue?

PD error:

[2022/06/27 12:52:17.438 +08:00] [ERROR] [base_client.go:166] ["[pd] failed updateMember"] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Canceled desc = context canceled target:xxx.xxx.xxx.89:2379 status:READY"] [errorVerbose="[PD:client:ErrClientGetMember]error:rpc error: code = Canceled desc = context canceled target:xxx.xxx.xxx.89:2379 status:READY\
github.com/tikv/pd/client.(*baseClient).updateMember\
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210323121136-78679e5e209d/client/base_client.go:301\
github.com/tikv/pd/client.(*baseClient).memberLoop\
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210323121136-78679e5e209d/client/base_client.go:165\
runtime.goexit\
\truntime/asm_amd64.s:1371"] [stack="github.com/tikv/pd/client.(*baseClient).memberLoop\
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210323121136-78679e5e209d/client/base_client.go:166"]
[2022/06/27 12:52:17.439 +08:00] [INFO] [base_client.go:296] ["[pd] cannot update member from this address"] [address=http://xxx.xxx.xxx.89:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Canceled desc = context canceled target:xxx.xxx.xxx.89:2379 status:READY"]
[2022/06/27 12:52:17.439 +08:00] [ERROR] [base_client.go:166] ["[pd] failed updateMember"] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Canceled desc = context canceled target:xxx.xxx.xxx.89:2379 status:READY"] [errorVerbose="[PD:client:ErrClientGetMember]error:rpc error: code = Canceled desc = context canceled target:xxx.xxx.xxx.89:2379 status:READY\
github.com/tikv/pd/client.(*baseClient).updateMember\
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210323121136-78679e5e209d/client/base_client.go:301\
github.com/tikv/pd/client.(*baseClient).memberLoop\
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210323121136-78679e5e209d/client/base_client.go:165\
runtime.goexit\
\truntime/asm_amd64.s:1371"] [stack="github.com/tikv/pd/client.(*baseClient).memberLoop\
\tgithub.com/tikv/pd@v1.1.0-beta.0.20210323121136-78679e5e209d/client/base_client.go:166"]

TiKV error:

Error: error happen in store 6 at xxx.xxx.xxx.7:20160: Io(Custom { kind: Other, error: "failed to put object Request ID: None Body: <?xml version=\"1.0\" encoding=\"UTF-8\"?>\
<Error><Code>NoSuchUpload</Code><Message>The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.</Message><Key>BOOL-5.1.4_20220619120002/6_231692646_9153_6907cac0f12da38fe08638edcf05ef221c1408439071c10fcad460bf0c087d69_1656304290496_write.sst</Key><BucketName>s3test</BucketName><Resource>/s3test/BOOL-5.1.4_20220619120002/6_231692646_9153_6907cac0f12da38fe08638edcf05ef221c1408439071c10fcad460bf0c087d69_1656304290496_write.sst</Resource><RequestId>16FC6000C92D863D</RequestId><HostId>c8864a4e-b9ea-4860-b1e7-e4fe536de2e0</HostId></Error>" }): [BR:KV:ErrKVStorage]tikv storage occur I/O error
Error: error happen in store 229754136 at xxx.xxx.xxx.57:20160: Io(Custom { kind: Other, error: "failed to put object Request ID: None Body: <?xml version=\"1.0\" encoding=\"UTF-8\"?>\
<Error><Code>NoSuchUpload</Code><Message>The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.</Message><Key>BOOL-5.1.4_20220619120002/229754136_230097385_9163_72c9048c429f0a7f8174adcbc4f77697fc31c5ec231719bbf49ba6d3f7642858_1656304681695_write.sst</Key><BucketName>s3test</BucketName><Resource>/s3test/BOOL-5.1.4_20220619120002/229754136_230097385_9163_72c9048c429f0a7f8174adcbc4f77697fc31c5ec231719bbf49ba6d3f7642858_1656304681695_write.sst</Resource><RequestId>16FC605DA41EDB9E</RequestId><HostId>c8864a4e-b9ea-4860-b1e7-e4fe536de2e0</HostId></Error>" }): [BR:KV:ErrKVStorage]tikv storage occur I/O error
Error: error happen in store 110331059 at xxx.xxx.xxx.24:20160: Io(Custom { kind: Other, error: "failed to put object Request ID: None Body: <?xml version=\"1.0\" encoding=\"UTF-8\"?>\
<Error><Code>InternalError</Code><Message>We encountered an internal error, please try again.: cause(Read failed. Insufficient number of disks online)</Message><Key>BOOL-5.1.4_20220619120002/110331059_231621883_9170_901fe8da2907c0af190553ae11100206e1b24aeb99b85c4d2ceff317483b17a3_1656305451584_default.sst</Key><BucketName>s3test</BucketName><Resource>/s3test/BOOL-5.1.4_20220619120002/110331059_231621883_9170_901fe8da2907c0af190553ae11100206e1b24aeb99b85c4d2ceff317483b17a3_1656305451584_default.sst</Resource><RequestId>16FC611BB7244917</RequestId><HostId>c8864a4e-b9ea-4860-b1e7-e4fe536de2e0</HostId></Error>" }): [BR:KV:ErrKVStorage]tikv storage occur I/O error
| username: CuteRay | Original post link

Could you please share the configuration for br backup?

| username: CuteRay | Original post link

If this issue occurs and there are no problems related to permissions, it is highly likely that there is an extra ‘/’ at the end of the URL when configuring the s3 endpoint. You can start troubleshooting from this aspect.

For specific references:

and

Column - Troubleshooting the issue of adding a directory separator suffix to the endpoint parameter when backing up to s3 with br | TiDB Community

| username: GreenGuan | Original post link

It’s quite strange: using the same command sometimes succeeds and sometimes reports the above error.

| username: CuteRay | Original post link

How about adding this configuration and giving it a try? --send-credentials-to-tikv=true

| username: xiaohetao | Original post link

--send-credentials-to-tikv: Indicates passing S3 access credentials to TiKV nodes.

| username: xiaohetao | Original post link

:+1::+1::+1::+1:

| username: GreenGuan | Original post link

After testing with the parameters added, there was one backup failure. The same command succeeded when executed again. It’s very strange; it feels like the br tool is not very stable. Moreover, the error reported is difficult to troubleshoot.

Error: msg:“Io(Custom { kind: Other, error: "failed to put object Error during dispatch: connection error: Connection reset by peer (os error 104)" })”

The backup parameters are as follows:

br backup full --pd xxx.148:2379 --storage s3://cxxx/Wxxxxx628002 --s3.endpoint ‘http://xxx’ --s3.region ‘xxx’ --ratelimit 30 --send-credentials-to-tikv=true --log-file /data/deploy/xxxx_20220628002.log

| username: GreenGuan | Original post link

There are still issues with the backup.

[2022/06/29 15:39:57.859 +08:00] [ERROR] [endpoint.rs:284] ["backup save file failed"] [error="Io(Custom { kind: Other, error: \"failed to put object Request ID: None Body: <html>\\r\\
<head><title>502 Bad Gateway</title></head>\\r\\
<body bgcolor=\\\"white\\\">\\r\\
<center><h1>502 Bad Gateway</h1></center>\\r\\
<hr><center>openresty</center>\\r\\
</body>\\r\\
</html>\\r\\
\" })"]
[2022/06/29 15:39:57.859 +08:00] [ERROR] [endpoint.rs:669] ["backup region failed"] [error="Io(Custom { kind: Other, error: \"failed to put object Request ID: None Body: <html>\\r\\
<head><title>502 Bad Gateway</title></head>\\r\\
<body bgcolor=\\\"white\\\">\\r\\
<center><h1>502 Bad Gateway</h1></center>\\r\\
<hr><center>openresty</center>\\r\\
</body>\\r\\
</html>\\r\\
\" })"] [end_key=] [start_key=] [region="id: 481976392 start_key: 7480000000000004FF0E5F72800000117DFFC7A7E20000000000FA end_key: 7480000000000004FF0E5F72800000117DFFD41D1B0000000000FA region_epoch { conf_ver: 169199 version: 41247 } peers { id: 481976393 store_id: 1428951 } peers { id: 517307504 store_id: 5 } peers { id: 538409640 store_id: 1236441 }"]
[2022/06/29 15:39:57.860 +08:00] [ERROR] [service.rs:86] ["backup canceled"] [error=RemoteStopped]
| username: xiaohetao | Original post link

I feel the connection is unstable. Do you have an NFT? Try backing it up.