Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tikv online restore
Has the unsafe-recover remove-fail-stores in TiDB v6.1 been supported for online execution?
I tried it and found that there are still errors
tikv-ctl --data-dir /DMBITXVC/tidb/tidb-data/tikv-20160/ unsafe-recover remove-fail-stores -s 2154,2155 --all-regions
[2022/07/19 10:41:31.546 +09:00] [WARN] [config.rs:604] [“compaction guard is disabled due to region info provider not available”]
[2022/07/19 10:41:31.546 +09:00] [WARN] [config.rs:712] [“compaction guard is disabled due to region info provider not available”]
[2022/07/19 10:41:31.549 +09:00] [ERROR] [executor.rs:1092] [“error while open kvdb: Storage Engine IO error: While lock file: /DMBITXVC/tidb/tidb-data/tikv-20160/db/LOCK: Resource temporarily unavailable”]
[2022/07/19 10:41:31.549 +09:00] [ERROR] [executor.rs:1095] [“LOCK file conflict indicates TiKV process is running. Do NOT delete the LOCK file and force the command to run. Doing so could cause data corruption.”]
Environment:
5-node TiKV, deleted data from two TiKV nodes of the target region
/tidb/tidb-data/tikv-20160$ rm -rf ./*
Cluster status becomes:
10.137.32.3:3000 grafana 10.137.32.3 3000 linux/x86_64 Up - /opt/tidb/tidb-deploy/grafana-3000
10.137.32.3:2379 pd 10.137.32.3 2379/2380 linux/x86_64 Up|L|UI /opt/tidb/tidb-data/pd-2379 /opt/tidb/tidb-deploy/pd-2379
10.137.32.3:9090 prometheus 10.137.32.3 9090/12020 linux/x86_64 Up /opt/tidb/tidb-data/prometheus-9090 /opt/tidb/tidb-deploy/prometheus-9090
10.137.32.3:4000 tidb 10.137.32.3 4000/10080 linux/x86_64 Up - /opt/tidb/tidb-deploy/tidb-4000
10.137.32.3:20160 tikv 10.137.32.3 20160/20180 linux/x86_64 Up /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.4:20160 tikv 10.137.32.4 20160/20180 linux/x86_64 Disconnected /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.4:20161 tikv 10.137.32.4 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161
10.137.32.5:20160 tikv 10.137.32.5 20160/20180 linux/x86_64 Disconnected /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.5:20161 tikv 10.137.32.5 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161
At this point, executing unsafe-recover remove-fail-stores on nodes that still have one replica reports an error, but it executes normally after stopping that node. Is there something wrong with the operation?
The TiKV process is still running, it should be executed after the TiKV process stops.
Online means that there is no need to stop the TiKV process?
This is my fault. I checked again and found that it requires using pdctl to delete, and there’s no need to stop TiKV.
pd-ctl -u http://10.137.32.3:2379 unsafe remove-failed-stores 3138,3137
Starting component `ctl`: /home/tidb/.tiup/components/ctl/v6.1.0/ctl pd -u http://10.137.32.3:2379 unsafe remove-failed-stores 3138,3137
Success!
Sorry for the trouble, everyone.
Just clear the information.
This topic was automatically closed 1 minute after the last reply. No new replies are allowed.