How to forcefully offline this SST file or mark it as ignored

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 怎么强制下线这个sst文件或者标记忽略这个sst

| username: TiDBer_3ZQWFOHI

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that caused the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]


The file here was accidentally deleted. How can I force it offline or mark it as ignored?

| username: 胡杨树旁 | Original post link

Was the SST file deleted? Was it deleted directly at the server level?

| username: TiDBer_3ZQWFOHI | Original post link

Yes, accidentally deleted an SST file on the server.

| username: tidb狂热爱好者 | Original post link

Got it, please provide the Chinese text you need translated.

| username: tidb狂热爱好者 | Original post link

Lossy recovery

| username: TiDBer_jYQINSnf | Original post link

If the SST is deleted, RocksDB will not start successfully. You can directly delete this node. TiKV has 3 replicas, so you can just add another node.
curl -X DELETE http://{pdip}:2379/pd/api/v1/store/${store_id}?force
The command to force delete a store is as above.

| username: TiDBer_3ZQWFOHI | Original post link

If a single node is deleted, the entire data will be lost, right?

| username: TiDBer_jYQINSnf | Original post link

A single TiKV? Wouldn’t that mean everything is gone? Isn’t it already tombstoned? Your environment seems a bit unserious.

| username: TiDBer_jYQINSnf | Original post link

I only deleted one SST file. You can try this:

tikv-ctl ldb manifest_dump --path={datadir}/db

This will list the SST files for each level. Then you can check if the deleted SST file is in the output. If it happens to be just a temporary SST file during the compaction process, then this deletion has no impact, and after restarting, recovery will automatically read it from the WAL.

If it is a normal file, it is highly likely that it won’t start. If you really want to ignore it and give it a try, you can look for similar methods in ldb to modify the manifest so that the manifest thinks this file never existed.

Given that you only have one replica and the data isn’t that important, losing it might not be a big deal. It’s not worth the trouble.

| username: TiDBer_jYQINSnf | Original post link

I found that RocksDB indeed has this:

tikv-ctl ldb --db <dbpath> unsafe_remove_sst_file file_number

Before deleting, make sure to check the manifest to see which db and cf this SST belongs to. If it is actual data, TiKV might be able to start. If it is not data and belongs to the raft cf, then it probably won’t start.

If the SST is under raftdb, it belongs to the raftlog. If it doesn’t contain metadata, there might be a chance.

Before executing the above operations, make a copy and back up your data. If you don’t understand, just give up. I haven’t done this before either; it’s just a possible method that might work :crazy_face:.

| username: h5n1 | Original post link

Tested version 6.1.1, and using the following method can bring up TiKV:

  1. Copy another SST file and rename it to the mistakenly deleted file. Choose a relatively small file with a later file number to avoid affecting system tables.
  2. After that, you can execute tikv-ctl ldb unsafe_remove_sst_file **15** --db=/data/v631/tikv/data/db. Here, 15 is the number of the SST file. Otherwise, this command will report an error saying the file cannot be found.

This method will also result in the loss of data in the copied SST file, and it’s unknown what other data might be lost.