Unable to Start After Redeploying a Node Following Cleanup

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 清除一个节点后重新部署无法启动

| username: Mingdr

【TiDB Version】v5.2.2
【Problem Encountered】 One of the KV node servers crashed. After restarting, it couldn’t rejoin the original cluster. I scaled it down, deleted all deployment files on that node, and followed the steps in this blog 故障排查 重启tikv 节点后 id 号变了,日志报地址占用 - TiDB 的问答社区 to delete the store ID. However, the same error still occurs.
The tikv.log on that node:

[2022/08/02 05:41:39.275 -04:00] [WARN] [client.rs:138] [“failed to update PD client”] [error=“Other("[components/pd_client/src/util.rs:306]: cancel reconnection due to too small interval")”]
[2022/08/02 05:41:39.276 -04:00] [ERROR] [util.rs:460] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { code: 2-UNKNOWN, message: "duplicated store address: id:885953 address:\"192.168.0.241:20160\" version:\"5.2.2\" status_address:\"192.168.0.241:20180\" git_hash:\"7acaec5d9c809439b9b0017711f114b44ffd9a49\" start_timestamp:1659433295 deploy_path:\"/home/tidb/Data/deploy/tikv-20160/bin\" , already registered by id:5 address:\"192.168.0.241:20160\" state:Offline version:\"5.2.2\" status_address:\"192.168.0.241:20180\" git_hash:\"7acaec5d9c809439b9b0017711f114b44ffd9a49\" start_timestamp:1652383550 deploy_path:\"/home/tidb/Deploy/tikv-20160/bin\" last_heartbeat:1656335779395060454 ", details: }))”]
[2022/08/02 05:41:39.276 -04:00] [ERROR] [util.rs:469] [“reconnect failed”] [err_code=KV:PD:Unknown] [err=“Other("[components/pd_client/src/util.rs:306]: cancel reconnection due to too small interval")”]

Now it won’t start at all, and the central server shows it as: offline

| username: TiDBer_jYQINSnf | Original post link

Is store5 still available?

| username: TiDBer_jYQINSnf | Original post link

Use pd-ctl store 5 to check if it is in the tombstone state. If it is, execute store remove-tombstone.

| username: wakaka | Original post link

In pdctl, after deleting the information and confirming it is useless, you can scale-in --force to clear it.

| username: Mingdr | Original post link

Used --force but it still doesn’t work.

| username: Mingdr | Original post link

5 is no longer available, it is offline.

| username: TiDBer_jYQINSnf | Original post link

5 needs to disappear completely, not just offline.

curl -X POST http://{pdip}:2379/pd/api/v1/store/${store_id}/state?state=Tombstone

First execute this to mark it as tombstone, then
pd-ctl store remove-tombstone
This way store 5 will be completely gone.
Be cautious with this operation, make sure not to make a mistake when marking it as tombstone.

| username: Mingdr | Original post link

Execution error occurred

| username: TiDBer_jYQINSnf | Original post link

I checked, and the words are correct. Just this last one, “Tombstone,” try changing it to “Tombstone” and see if it works.

| username: Mingdr | Original post link

Still not working.

| username: TiDBer_jYQINSnf | Original post link

In version 5.4, only offline and up statuses are allowed in the code… In version 4.0, it was possible to mark as tombstone.
I need to update my operations manual. Let me look into how to proceed.

| username: Mingdr | Original post link

Sure, no problem :+1:

| username: TiDBer_jYQINSnf | Original post link

Can’t store delete 5 be turned into a tombstone? There’s no leader or region on it either.

| username: Mingdr | Original post link

You can delete it, but the status hasn’t changed. This server has already crashed and is unreachable.

| username: TiDBer_jYQINSnf | Original post link

Try this:

curl -X DELETE http://{pdip}:2379/pd/api/v1/store/5?force=true 
| username: Mingdr | Original post link

The image is not visible. Please provide the text you need translated.

| username: TiDBer_jYQINSnf | Original post link

Label it as physically_destroyed, and then check if the new node can go online.

| username: Mingdr | Original post link

Got it, thanks~

| username: TiDBer_jYQINSnf | Original post link

Looking at the code, the states “physically_destroyed” and “tombstone” do not affect the online status of the same address in TiKV.

| username: Mingdr | Original post link

However, it does indeed have an impact. It just won’t start and keeps reporting errors. Moreover, the reported IDs are clearly different, yet it still reports duplicate IDs, which is a bit confusing.