Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 清除一个节点后重新部署无法启动
【TiDB Version】v5.2.2
【Problem Encountered】 One of the KV node servers crashed. After restarting, it couldn’t rejoin the original cluster. I scaled it down, deleted all deployment files on that node, and followed the steps in this blog 故障排查 重启tikv 节点后 id 号变了,日志报地址占用 - TiDB 的问答社区 to delete the store ID. However, the same error still occurs.
The tikv.log on that node:
[2022/08/02 05:41:39.275 -04:00] [WARN] [client.rs:138] [“failed to update PD client”] [error=“Other("[components/pd_client/src/util.rs:306]: cancel reconnection due to too small interval")”]
[2022/08/02 05:41:39.276 -04:00] [ERROR] [util.rs:460] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { code: 2-UNKNOWN, message: "duplicated store address: id:885953 address:\"192.168.0.241:20160\" version:\"5.2.2\" status_address:\"192.168.0.241:20180\" git_hash:\"7acaec5d9c809439b9b0017711f114b44ffd9a49\" start_timestamp:1659433295 deploy_path:\"/home/tidb/Data/deploy/tikv-20160/bin\" , already registered by id:5 address:\"192.168.0.241:20160\" state:Offline version:\"5.2.2\" status_address:\"192.168.0.241:20180\" git_hash:\"7acaec5d9c809439b9b0017711f114b44ffd9a49\" start_timestamp:1652383550 deploy_path:\"/home/tidb/Deploy/tikv-20160/bin\" last_heartbeat:1656335779395060454 ", details: }))”]
[2022/08/02 05:41:39.276 -04:00] [ERROR] [util.rs:469] [“reconnect failed”] [err_code=KV:PD:Unknown] [err=“Other("[components/pd_client/src/util.rs:306]: cancel reconnection due to too small interval")”]
Now it won’t start at all, and the central server shows it as: offline
Is store5 still available?
Use pd-ctl store 5
to check if it is in the tombstone state. If it is, execute store remove-tombstone
.
In pdctl, after deleting the information and confirming it is useless, you can scale-in --force to clear it.
Used --force but it still doesn’t work.
5 is no longer available, it is offline.
5 needs to disappear completely, not just offline.
curl -X POST http://{pdip}:2379/pd/api/v1/store/${store_id}/state?state=Tombstone
First execute this to mark it as tombstone, then
pd-ctl store remove-tombstone
This way store 5 will be completely gone.
Be cautious with this operation, make sure not to make a mistake when marking it as tombstone.
I checked, and the words are correct. Just this last one, “Tombstone,” try changing it to “Tombstone” and see if it works.
In version 5.4, only offline and up statuses are allowed in the code… In version 4.0, it was possible to mark as tombstone.
I need to update my operations manual. Let me look into how to proceed.
Can’t store delete 5 be turned into a tombstone? There’s no leader or region on it either.
You can delete it, but the status hasn’t changed. This server has already crashed and is unreachable.
Try this:
curl -X DELETE http://{pdip}:2379/pd/api/v1/store/5?force=true
The image is not visible. Please provide the text you need translated.
Label it as physically_destroyed, and then check if the new node can go online.
Looking at the code, the states “physically_destroyed” and “tombstone” do not affect the online status of the same address in TiKV.
However, it does indeed have an impact. It just won’t start and keeps reporting errors. Moreover, the reported IDs are clearly different, yet it still reports duplicate IDs, which is a bit confusing.