How to Handle Accidental Deletion of SST Files in TiKV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv里的sst文件误删除怎么处理

| username: TiDBer_3ZQWFOHI

【TiDB Usage Environment】Production Environment / Testing / POC
【TiDB Version】
【Reproduction Path】What operations were performed that led to the issue
【Encountered Issue: Issue Phenomenon and Impact】
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】


| username: 有猫万事足 | Original post link

If there are 3 TiKV nodes and one is deleted, data will not be lost, but the cluster will be unavailable. You can find a new machine and dynamically expand a new TiKV node, and the cluster will become available again. After that, try to take the problematic TiKV node offline.

If the number of TiKV nodes exceeds 3, data will not be lost, and the cluster will still be available. If the load on other TiKV nodes is not high, you can try to take the problematic node offline. Dynamically expand to the problematic node.

There is an article in the column that is similar to your situation, but I haven’t tried the content inside. You can give it a try.

| username: 考试没答案 | Original post link

TiDB is a distributed system. The biggest advantage of a distributed system is the three-replica disaster recovery mechanism. Deleting one doesn’t matter, you only lose one node, the business remains highly available, no problem, you can add another one later.

| username: tidb菜鸟一只 | Original post link

Having 3 replicas doesn’t have a significant impact. If you just delete one SST file, you can directly use the command tikv-ctl bad-ssts to supplement it. If you delete more, it is recommended to add a new TiKV node and then scale down the damaged TiKV node.

| username: Billmay表妹 | Original post link

Are you a single node?

| username: TiDBer_3ZQWFOHI | Original post link

Yes, single node.

| username: 数据小黑 | Original post link

In TiDB version 6.4.0, you can use the TiKV Control tool to force offline or ignore SST files. The specific steps are as follows:

  1. Stop the TiKV instance using the following command:

    tiup cluster stop <cluster-name> -N <tikv-node1>:<tikv-port>,<tikv-node2>:<tikv-port>
    
  2. Use the TiKV Control tool to find the SST files that need to be taken offline or ignored with the following command:

    tikv-ctl --db <db-path> --cmd unsafe-recover remove-fail-stores
    

    Here, <db-path> is the path to the TiKV data directory.

  3. Based on the output SST file information, select the SST files that need to be taken offline or ignored using the following command:

    tikv-ctl --db <db-path> --cmd unsafe-recover remove-fail-stores --ssts <sst-file1>,<sst-file2>,...
    

    Here, <sst-file1>,<sst-file2>,... are the paths to the SST files that need to be taken offline or ignored, separated by commas.

  4. Start the TiKV instance using the following command:

    tiup cluster start <cluster-name> -N <tikv-node1>:<tikv-port>,<tikv-node2>:<tikv-port>
    

It should be noted that forcing offline or ignoring SST files may lead to data loss or data inconsistency, so it is necessary to carefully assess the risks before performing the operation.

| username: TiDBer_3ZQWFOHI | Original post link

It reports this error

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.