TiKV Online Restore

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv online restore

| username: banana_jian

Has the unsafe-recover remove-fail-stores in TiDB v6.1 been supported for online execution?

I tried it and found that there are still errors
tikv-ctl --data-dir /DMBITXVC/tidb/tidb-data/tikv-20160/ unsafe-recover remove-fail-stores -s 2154,2155 --all-regions
[2022/07/19 10:41:31.546 +09:00] [WARN] [config.rs:604] [“compaction guard is disabled due to region info provider not available”]
[2022/07/19 10:41:31.546 +09:00] [WARN] [config.rs:712] [“compaction guard is disabled due to region info provider not available”]
[2022/07/19 10:41:31.549 +09:00] [ERROR] [executor.rs:1092] [“error while open kvdb: Storage Engine IO error: While lock file: /DMBITXVC/tidb/tidb-data/tikv-20160/db/LOCK: Resource temporarily unavailable”]
[2022/07/19 10:41:31.549 +09:00] [ERROR] [executor.rs:1095] [“LOCK file conflict indicates TiKV process is running. Do NOT delete the LOCK file and force the command to run. Doing so could cause data corruption.”]

| username: banana_jian | Original post link

Environment:
5-node TiKV, deleted data from two TiKV nodes of the target region
/tidb/tidb-data/tikv-20160$ rm -rf ./*
Cluster status becomes:
10.137.32.3:3000 grafana 10.137.32.3 3000 linux/x86_64 Up - /opt/tidb/tidb-deploy/grafana-3000
10.137.32.3:2379 pd 10.137.32.3 2379/2380 linux/x86_64 Up|L|UI /opt/tidb/tidb-data/pd-2379 /opt/tidb/tidb-deploy/pd-2379
10.137.32.3:9090 prometheus 10.137.32.3 9090/12020 linux/x86_64 Up /opt/tidb/tidb-data/prometheus-9090 /opt/tidb/tidb-deploy/prometheus-9090
10.137.32.3:4000 tidb 10.137.32.3 4000/10080 linux/x86_64 Up - /opt/tidb/tidb-deploy/tidb-4000
10.137.32.3:20160 tikv 10.137.32.3 20160/20180 linux/x86_64 Up /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.4:20160 tikv 10.137.32.4 20160/20180 linux/x86_64 Disconnected /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.4:20161 tikv 10.137.32.4 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161
10.137.32.5:20160 tikv 10.137.32.5 20160/20180 linux/x86_64 Disconnected /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.5:20161 tikv 10.137.32.5 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161

At this point, executing unsafe-recover remove-fail-stores on nodes that still have one replica reports an error, but it executes normally after stopping that node. Is there something wrong with the operation?

| username: songxuecheng | Original post link

The TiKV process is still running, it should be executed after the TiKV process stops.

| username: ddhe9527 | Original post link

Online means that there is no need to stop the TiKV process?

| username: banana_jian | Original post link

This is my fault. I checked again and found that it requires using pdctl to delete, and there’s no need to stop TiKV.

pd-ctl -u http://10.137.32.3:2379 unsafe remove-failed-stores 3138,3137
Starting component `ctl`: /home/tidb/.tiup/components/ctl/v6.1.0/ctl pd -u http://10.137.32.3:2379 unsafe remove-failed-stores 3138,3137
Success!

Sorry for the trouble, everyone.

| username: songxuecheng | Original post link

Just clear the information.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.