How to fix a downed TiKV node and identified bad-region?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv有一台down了,找到了bad-region,怎样修复?

| username: LiuJunCheng

[TiDB Usage Environment] Production Environment
[TiDB Version] V4.0.11
[Issue Encountered: One server had memory failure. After repair, the components on the server still couldn’t go online, and the database was inaccessible. Restarted three servers, currently all three TiDB instances are down, one TiKV instance is down, other components are normal]
[Attachment: Screenshot/Log/Monitoring]
Using the command ./tikv-ctl -db /tidbdata/deploy/tikv-20160/data/db bad-regions found bad-region, the specific output is as follows. How to repair this data?

thread '<unnamed>' panicked at 'rocksdb background error. db: raft, reason: compaction, error: Corruption: Bad table magic number: expected 9863518390377041911, found 4733287020440918081 in /tidbdata/deploy/tikv-20160/data/raft/140817.sst', components/engine_rocks/src/event_listener.rs:66:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Segmentation fault
| username: tidb菜鸟一只 | Original post link

Column - Three Strategies for Handling Abnormal TiKV Scale-Down Offline | TiDB Community
See 4.2

| username: 像风一样的男子 | Original post link

| username: Soysauce520 | Original post link

If there are more than 3 TiKV nodes, you can scale down and then scale up to recover. In the case of only 3 TiKV nodes, scaling down will result in no replicas to replace, causing it to get stuck. TiDB server can be directly scaled down and scaled up.

| username: zhang_2023 | Original post link

Column - Three Strategies for Handling Abnormal TiKV Scale-Down Offline | TiDB Community

| username: GreenGuan | Original post link

The first floor should be enough.

| username: TiDBer_HErMeXDz | Original post link

If you have 3 TiKV nodes and one fails, you can add another TiKV node.

| username: TiDBer_QYr0vohO | Original post link

If one out of three TiKV nodes is damaged, you can first add a new node and then remove the damaged one.

| username: TiDBer_嘎嘣脆 | Original post link

Add one instance for 3 replicas, balance the shards, and remove the other faulty one.

| username: 胡杨树旁 | Original post link

Expand a TiKV server first, and after the expansion is complete, scale down the original faulty one.

| username: TiDBer_fbU009vH | Original post link

Column - Three Techniques for Handling Abnormal TiKV Scale-Down Offline | TiDB Community

| username: oceanzhang | Original post link

First scale out, then scale in.

| username: zhang_2023 | Original post link

Just scale it back up.

| username: zhaokede | Original post link

Scale down one node.

| username: TiDBer_QYr0vohO | Original post link

Scale up and then scale down

| username: TiDBer_RjzUpGDL | Original post link

First scale out, then scale in.

| username: TiDBer_JUi6UvZm | Original post link

:+1: :+1: :+1: