Abnormal TiKV Decommissioning

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv下线异常

| username: jackerzhou

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] v6.5.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact]
Taking down a TiKV node resulted in an abnormal error
rocksdb:low[256931]: segfault at 5649f56fd230 ip 00005649f56fd230 sp 00007f811b2d6458 error 15

[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
“store”: {
“id”: 2,
“address”: “192.168.1.1:20160”,
“version”: “6.5.0”,
“peer_address”: “192.168.1.1:20160”,
“status_address”: “192.168.1.1:20180”,
“git_hash”: “47b81680f75adc4b7200480cea5dbe46ae07c4b5”,
“start_timestamp”: 1679556168,
“deploy_path”: “/mnt/tidb-deploy/tikv-20160/bin”,
“last_heartbeat”: 1679536854136206516,
“state_name”: “Offline”
},
“status”: {
“capacity”: “0B”,
“available”: “0B”,
“used_size”: “0B”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 2,
“region_weight”: 1,
“region_score”: 157,
“region_size”: 157,
“witness_count”: 0,
“slow_score”: 0,
“start_ts”: “2023-03-23T15:22:48+08:00”,
“last_heartbeat_ts”: “2023-03-23T10:00:54.136206516+08:00”
}

| username: h5n1 | Original post link

  1. There is insufficient information about the cause of the exception. You need to provide an explanation of the environment, TiKV logs, etc., for the experts to analyze.
  2. For the node that is to be taken offline, has the region migration been completed and the node become a tombstone state? Currently, it shows “region_count”: 2. If these 2 regions cannot be migrated, try using pd-ctl operator add remove-peer.
| username: wzf0072 | Original post link

Grafana cluster–>clusterOverview–>PD–>Post the monitoring data for Abnormal stores and region health.