Can TiKV start normally by excluding the corrupted SST files in the cluster?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 集群多个sst文件损坏,是否可以排除这些错误sst文件让tikv正常启动

| username: guoyanliang

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4.0
[Encountered Problem] The cluster has 5 TiKV nodes, and three of them have a single-digit number of SST file corruptions, causing TiKV to fail to start. Is there any way to ignore these errors or discard some data to allow TiKV to start normally?
[Reproduction Path] Operations performed that led to the issue
[Problem Phenomenon and Impact] TiKV cannot start normally in the production environment, and data cannot be recovered


[Attachment]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: h5n1 | Original post link

TiKV Control 使用说明 | PingCAP 文档中心 Try referring to the bad SST handling.

| username: guoyanliang | Original post link

Hello, since there is no suggested solution, I tried removing the abnormal SST file, but it shows this error. Is there any other command or method to start TiKV first?

| username: h5n1 | Original post link

Currently, this is the only method I can find.

| username: zhouzeru | Original post link

Try to output the corrupted SST files and clean them up.

| username: db_user | Original post link

May I ask how many replicas there are? If there are five replicas, you might try using unsafe-recover. If there really is no other way.

| username: h5n1 | Original post link

The official command output example has an issue. This should be the SST file number, which is the same as the file name. Refer to my test.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.