Failed to install TiFlash and unable to delete it

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 安装tiflash不成功,又无法删除

| username: vesa

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v7.1.0
[Reproduction Path] Uninstall TiFlash, upgrade TiDB version to 7.1.0, use the scale-out command to add TiFlash nodes, but encounter a timeout error interruption.

[Encountered Problem: Problem Phenomenon and Impact]
Check the cluster status, TiFlash becomes Tombstone status, but cannot be deleted, using the purge command is ineffective, and there is no prompt.

Using scale-in to reduce TiFlash nodes still reports an error.

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: ShawnYan | Original post link

Check the TiFlash logs to see if there are any errors during the purge.

| username: xfworld | Original post link

You can follow these steps to manually decommission a TiFlash node and remove its display from TiUP:

  1. Use the tiup cluster display <cluster-name> command to check the status of the TiFlash node and confirm that its status is Tombstone.
  2. Use the tiup ctl:v<CLUSTER_VERSION> pd -u http://<pd_ip>:<pd_port> store command to find the store ID corresponding to the TiFlash node.
  3. Wait for the store of the TiFlash node to disappear from PD or its status to change to Tombstone, then stop the TiFlash process.
  4. Manually delete the TiFlash data files (which can be found in the data_dir directory specified in the TiFlash configuration in the cluster topology file).
  5. Use the tiup cluster scale-in <cluster-name> --node <pd_ip>:<pd_port> --force command to remove the information of the TiFlash node.

Before all TiFlash nodes stop running, if you have not canceled all tables replicated to TiFlash, you need to manually clean up all data replication rules related to TiFlash in PD, otherwise, the TiFlash node cannot be successfully decommissioned. The steps to manually clean up all data replication rules related to TiFlash in PD are as follows:

  1. View all data replication rules related to TiFlash in the current PD instance:
tiup ctl:v<CLUSTER_VERSION> pd -u http://<pd_ip>:<pd_port> operator show
  1. Manually cancel all data replication rules related to TiFlash:
tiup ctl:v<CLUSTER_VERSION> pd -u http://<pd_ip>:<pd_port> operator cancel <operator-id>

Where <operator-id> is the ID of the data replication rule related to TiFlash.

| username: Sean007 | Original post link

Please check if the SSH mutual trust between the master control machine and the TiFlash host is functioning properly.

| username: redgame | Original post link

The problem is not apparent, but you can try deleting: stop all operations associated with the node in the TiDB cluster. Then, manually delete the relevant data files and configuration information of the TiFlash node. Finally, remove the node’s information from the cluster’s metadata to ensure the cluster correctly recognizes its status change.