How remove an Offline TiFLASH store?

Application environment:

Production

TiDB version:

v7.5.0

Reproduction method:

Problem:

Hi,
I have 6 servers which tidb is installed. I had a tiflash node in server number 2 which that server get removed completely. Now I can not scale out another tiflash in that server .

As I realized problem is the store of that old tiflash still exists.

O tried to remove store with below commands but I was not successful. It says store is deleted but its not !

Attachment:

I appreciated any help .

To remove an offline TiFlash store, you can follow the steps below:

  1. First, you need to identify the Store ID corresponding to the offline TiFlash node. You can use the pd-ctl tool to view the Store ID. If you are using TiUP deployment, you can replace pd-ctl with tiup ctl:v<CLUSTER_VERSION> pd .

  2. Once you have the Store ID of the offline TiFlash node, you can manually remove it from the cluster. You can use the following command to remove the TiFlash node:

    tiup cluster scale-in <cluster-name> --node <offline-node-address>
    

    In your case, the command would look like:

    tiup cluster scale-in <cluster-name> --node <offline-node-address>
    

    Replace <cluster-name> with the name of your TiDB cluster and <offline-node-address> with the address of the offline TiFlash node .

By following these steps, you should be able to successfully remove the offline TiFlash store from your TiDB cluster. If you encounter any issues during the process, please let me know so I can assist you further.

Its not possible because that node does not exists anymore when I’m using ‘tiup cluster display’ command. Just its store exists and I need to delete store directly.

I understand your situation. To directly delete the store of the offline TiFlash node, you can use the following steps:

  1. Access the PD control tool either through pd-ctl or tiup ctl:v<CLUSTER_VERSION> pd depending on your deployment method.

  2. Once you have accessed the PD control tool, you can use the store command to view the store ID corresponding to the offline TiFlash node. The command should look like this:

    tiup ctl:v<CLUSTER_VERSION> pd -u http://<pd_ip>:<pd_port> store
    

    Make sure to replace <CLUSTER_VERSION> with your actual cluster version, <pd_ip> with the IP address of an active PD instance, and <pd_port> with the port number of the PD instance.

  3. Once you have identified the store ID of the offline TiFlash node, you can proceed to delete the store directly using the appropriate command in the PD control tool.

By following these steps, you should be able to delete the store of the offline TiFlash node directly from your TiDB cluster. If you encounter any difficulties or need further assistance, feel free to ask for help.

I did it but did not deleted . it says store deleted successfully but when I query store id again , its there and is not removed

I can only answer the question based on the official documentation. Please feel free to ask any other question related to TiDB or TiDB Cloud.

Got it. Let me redirect this question to our engineers. Thanks for your report.

1 Like

Thank you, I’m waiting to see how we can fix it.

pd-ctl store delete <store_id> can only logically remove node, and can not physically remove.

You can use pd-ctl store cancel-delete <store_id> to restore the node logically and then use tiup cluster scale-in to physically remove the node.

2 Likes

PD always tries to keep the number of Regions peers match the rules of TiFlash replicas before removing a store. When you use pd-ctl store delete <store_id>, PD can not find other living TiFlash stores to put the Regions of TiFlash replica, it can not remove the old TiFlash stores.

You can handle it either in these ways:

  • Check it by executing select * from information_schema.tiflash_replica;. And ALTER TABLE ... SET TIFLASH REPLICA 0 to remove all the tiflash replicas first. Then wait for the PD to remove all obsoleted peers, then the store_id can be automatically removed.
  • Or deploy new TiFlash instance(s) using other IP or Port so that PD will treat the instances as new stores and move the Region peers on those new TiFlash instances. The old deleted store will be automatically removed after that.
1 Like

Thanks, I did it and store status switched to Down now but still can not scale-in because that tiflash node does not exists in topology and I can not see it in ‘tiup cluster display’ output. So when I want to scale-in I see this error:

Thanks, While its s production server I can not turn off tiflash because its necessary for running projects but your second solution has been worked . I just changed ports and now I’m able to add a new tiflash node .

Also old leaving store is automatically deleted after I deployed tiflash node with another port in same server .

Screenshot 2024-04-08 151536

Thank you :pray: