TiKV Error

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv 报错

| username: 巨化斑鸠

TIKV.LOG keeps reporting errors.

The node at XXX.XXX.XXX.XXX was taken offline 2 years ago (the process is unclear).
The status of this node cannot be found using tiup cluster display.

How can I resolve this and remove these errors?

Logs:
[2023/03/31 15:31:24.410 +08:00] [INFO] [] [“Failed to connect to channel, retrying”]
[2023/03/31 15:31:24.505 +08:00] [WARN] [raft_client.rs:199] [“send to XXX.XXX.XXX.XXX:20174 fail, the gRPC connection could be broken”]
[2023/03/31 15:31:24.505 +08:00] [INFO] [transport.rs:144] [“resolve store address ok”] [addr=10.2.13.81:20172] [store_id=442332]
[2023/03/31 15:31:24.505 +08:00] [ERROR] [transport.rs:163] [“send raft msg err”] [err=“Other("[src/server/raft_client.rs:208]: RaftClient send fail")”]
[2023/03/31 15:31:24.505 +08:00] [INFO] [raft_client.rs:48] [“server: new connection with tikv endpoint”] [addr=10.2.13.81:20172]
[2023/03/31 15:31:24.506 +08:00] [INFO] [] [“Connect failed: {"created":"@1680247884.506287486","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"/rust/registry/src/github.com-1ecc6299db9ec823/grpcio-sys-0.5.3/grpc/src/core/lib/iomgr/tcp_client_posix.cc","file_line":200,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:XXX.XXX.XXX.XXX:20172"}”]
[2023/03/31 15:31:24.506 +08:00] [INFO] [] [“Subchannel 0x7f1b93967080: Retry in 999 milliseconds”]
[2023/03/31 15:31:24.506 +08:00] [WARN] [raft_client.rs:296] [“RPC batch_raft fail”] [err=“Some(RpcFailure(RpcStatus { status: 14-UNAVAILABLE, details: Some("failed to connect to all addresses") }))”] [sink_err=“Some(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some("failed to connect to all addresses") })))”] [to_addr=XXX.XXX.XXX.XXX:20172]

| username: xfworld | Original post link

Refer to this:


If the above is sufficient, then you don’t need the one below.

And this one:

| username: 考试没答案 | Original post link

unsafe command. Enter the pd console.

| username: 巨化斑鸠 | Original post link

Execute tiup cluster display to check the status of the offline node and wait for its status to become Tombstone.
There is no node information for XXX.XXX.XXX.XXX:

However, the background log is still reporting continuously.

| username: 巨化斑鸠 | Original post link

How do I operate it?

| username: xfworld | Original post link

If it’s a production environment, it’s recommended to back up first.

| username: 巨化斑鸠 | Original post link

There is no second set of production environments, is the risk high?

Just checked the audit

On 2021-3-19, scale-in was executed on XXX.XXX.XXX.XXX: node information
Also, edit-config does not have the XXX.XXX.XXX.XXX: node information

| username: xfworld | Original post link

It’s already 2023… :rofl:

| username: 巨化斑鸠 | Original post link

What’s the secret in 2023?

| username: Min_Chen | Original post link

Hello,

You can first use pd-ctl to check if there is this store in PD. Refer to the method store [delete | cancel-delete | label | weight | remove-tombstone | limit ] <store_id> [--jq="<query string>"].

If there is a residual store, you can delete it.