Urgent Help Needed: TiDB Fails to Start After Forcibly Taking TiKV Offline, Keeps Trying to Connect to the Offline TiKV. Has Anyone Encountered This?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 紧急求救 ,tikv 强制下线后 tidb 启动失败,还是会不断的链接下线的tikv ,各位大佬有遇到的么

| username: 爱白话的晓辉

After forcibly taking a node offline, TiDB fails to start, and the logs show that it continuously tries to connect to the offline node. How can this be fixed?

TiDB startup failure log:
image

| username: xfworld | Original post link

What is the current status of the cluster?

| username: 爱白话的晓辉 | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: zhimadi | Original post link

It may take a while, probably not finished going offline yet. Because it is asynchronous.

| username: TiDBer_jYQINSnf | Original post link

The TiDB node shouldn’t be down. Check the TiDB node logs. The log you posted isn’t a fatal error, right? It shouldn’t cause an exit because of that log.

| username: Kongdom | Original post link

You probably encountered this situation. In your screenshot, one of the TiKV statuses is Pending Offline.

| username: Kongdom | Original post link

You can refer to this SOP:

| username: 爱白话的晓辉 | Original post link

I am now trying to scale out the previously decommissioned node, but it prompts me that there is a node information conflict and does not allow the installation.

| username: 爱白话的晓辉 | Original post link

Where can I delete that part of the metadata in PD?

| username: Kongdom | Original post link

Is it a production environment? Don’t operate like this. Wait until the cluster is successfully taken offline.

| username: tidb菜鸟一只 | Original post link

Do you only have 3 TiKV nodes? Then you can’t scale down, right? Won’t it crash? You can’t ensure three replicas!

| username: xingzhenxiang | Original post link

The TiKV replicas cannot meet the requirement of 3 anymore.

| username: ffeenn | Original post link

After scaling down on the same node, use a different port when scaling up.

| username: Kongdom | Original post link

Has the downtime ended after the whole night?

| username: 考试没答案 | Original post link

The multi-replica majority protocol of TiKV ensures that forcibly shutting down one replica will not affect the task service, meaning it can still provide normal service. This clearly indicates an operational issue, right?

| username: 考试没答案 | Original post link

TiDB can achieve load balancing and automatic failover. Additionally, it is stateless.

| username: 考试没答案 | Original post link

Please send the command you use to force logout. I want to try forcing logout on my test cluster.

| username: dba-kit | Original post link

Find the store ID of the offline node, then use pd-ctl store cancel-delete .

| username: TI表弟 | Original post link

I have a post, you can take a look. To find the store ID of the offline node, each TiKV instance corresponds to a store_id. Then use pd-ctl store delete to remove this store_id.

| username: 考试没答案 | Original post link

You can use pd-ctl related commands. If you are sure that you don’t need this TiKV node anymore, use the unsafe command to delete it.

The delete command might indicate successful deletion, but it may not actually delete it.