Fatal Error: TiKV Cannot Start

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 致命错,tikv不能启动

| username: 奋斗的大象

[FATAL] [server.rs:428] ["panic_mark_file /data12/tidb/data/tikv-20162/panic_mark_file exists, there must be something wrong with the db. Do not remove the panic_mark_file and force the TiKV node to restart. Please contact TiKV maintainers to investigate the issue. If needed, use scale in and scale out to replace the TiKV node
[Attachment: Screenshot/Log/Monitoring]

| username: tidb菜鸟一只 | Original post link

Just handle it by scaling up and then scaling down; it’s the safest and quickest method.

| username: TI表弟 | Original post link

Use scale in and scale out to replace the TiKV node.

| username: TI表弟 | Original post link

It says to scale down first and then scale up, you can give it a try.

| username: 奋斗的大象 | Original post link

[2024/05/22 12:07:06.165 +08:00] [INFO] [region_cache.go:2377] [“[health check] check health error”] [store=10.114.26.112:20162] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.114.26.112:20162: connect: connection refused"”]
[2024/05/22 12:07:06.165 +08:00] [INFO] [region_request.go:785] [“mark store’s regions need be refill”] [id=183060412] [addr=10.114.26.112:20162] [error=“context deadline exceeded”]

| username: zhaokede | Original post link

In the first error log, it was suggested to replace the TiKV node through scaling in and out.

| username: TiDBer_QYr0vohO | Original post link

Handle scaling up and down.

| username: 奋斗的大象 | Original post link

region_cache.go:2377] [“[health check] check health error”] [store=10.114.26.112:20161] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.114.26.112:20161: connect: connection refused"”]

| username: 奋斗的大象 | Original post link

Port conflict for ‘20162’ between ‘tikv_servers:10.114.26.112.port’

| username: Billmay表妹 | Original post link

What was your deployment like before the issue occurred?

Can you take a screenshot of your machine configuration? How many TiKV instances?

| username: 奋斗的大象 | Original post link

It was working fine before with 7 machines. Yesterday, one machine crashed, and today it won’t restart. Expansion also reports a port conflict: “code”: 1, “error”: “port conflict for ‘20162’ between ‘tikv_servers:10.114.26.112.port’ and ‘tikv_servers:10.114.26.112.port’”.

| username: tony5413 | Original post link

Use scale in and scale out to replace the TiKV node.

| username: 奋斗的大象 | Original post link

TiKV decommission failed linux/x86_64 Pending Offline

| username: tidb菜鸟一只 | Original post link

If you are scaling on the same machine, you need to change the port. If you have 7 TiKV instances and one of them panics, you can try scaling down and then scaling up again.

| username: TiDBer_q2eTrp5h | Original post link

You can try scaling down first and then scaling up. I encountered this issue once before and solved it this way.

| username: TIDB-Learner | Original post link

I am wondering if the problematic TiKV instance can be successfully scaled down after scaling up?