After a TiDB Node Failure, Its Information Still Exists on PD

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb节点故障后pd上还有它的信息

| username: Hacker_qeCXjAN2

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.0.0
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
A TiDB node failed, and after the machine was taken away, PD was restarted. However, PD still retains the information of the original TiDB node and keeps printing this log:
2023/11/09 13:35:45.280 +08:00] [WARN] [proxy.go:181] [“fail to recv activity from remote, stay inactive and wait to next checking round”] [remote=192.168.1.6:4000] [interval=2s] [error=“dial tcp 192.168.1.6:4000: connect: no route to host”]
How can I remove it?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: 普罗米修斯 | Original post link

Has the TiDB node been taken offline from the cluster?

| username: Hacker_qeCXjAN2 | Original post link

After the machine failure, it couldn’t be connected and was directly removed, so scale-in couldn’t be executed. Later, I modified the configuration in tiup to remove all 1.6 nodes and restarted the entire cluster, but the PD log still shows this node.

| username: Fly-bird | Original post link

Can it be seen in tiup cluster?

| username: Hacker_qeCXjAN2 | Original post link

The display is not visible.

| username: h5n1 | Original post link

Remove the corresponding address in run_pd.sh under the deployment directory, and then perform a rolling restart of PD.

| username: Hacker_qeCXjAN2 | Original post link

I didn’t quite understand. I have seen run_pd.sh before, but there is no address information for the TiDB node in it.

| username: h5n1 | Original post link

Did I misread it? After the TiDB node failed, only the PD was restarted, and the TiDB node was not scaled down?

| username: Hacker_qeCXjAN2 | Original post link

Shrunk, didn’t succeed, nodes couldn’t connect, there was an error at the stop step.

| username: h5n1 | Original post link

Check with tiup cluster display.

| username: Hacker_qeCXjAN2 | Original post link

The image is not visible. Please provide the text you need translated.

| username: h5n1 | Original post link

Also capture the subsequent status. TiDB can be scaled down using --force.

| username: Hacker_qeCXjAN2 | Original post link

The default value of tidb_gc_life_time is 10m, which means that the data can be rolled back within 10 minutes. If the data is deleted for more than 10 minutes, it cannot be rolled back. You can adjust the value of tidb_gc_life_time according to your needs.

| username: Jellybean | Original post link

Since the machines have been migrated, you can add the --force parameter when executing scale-in to clean up the residual registration information in the cluster.

| username: tidb菜鸟一只 | Original post link

If the node is already damaged, directly using --force during scaling down will directly delete the node metadata.

| username: Kongdom | Original post link

This indicates that the scale-in was not successful. Manually modifying the configuration won’t work; you still need to execute the scale-in with the --force option to force the scale-in.

| username: TiDBer_小阿飞 | Original post link

Did you forcefully take it offline as the previous teacher mentioned?

| username: Soysauce520 | Original post link

When performing a scale-in with the force option, if PD still has TiDB information, you need to use pd-ctl to remove it.

| username: Hacker_qeCXjAN2 | Original post link

Could you please tell me the specific command? I only see a delete command and don’t know how to pass the parameters.

| username: Soysauce520 | Original post link

I misunderstood. The “remove” command is for TiKV. After you added the “force” option to execute the scale-down, can you still see the TiDB server in the PD panel on the monitoring dashboard?