TiKV cannot connect to PD

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv无法连接到 pd

| username: simonzhang

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed that caused the issue
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]

After installing the TiDB cluster using tidb-operator, PD starts normally, but TiKV cannot connect to PD and shows the following error:

[INFO] [util.rs:567] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: "Deadline Exceeded", details: }))”] [endpoints=http://tidb-cluster-pd:2379]

Inside the container, http://tidb-cluster-pd:2379/pd/api/v1/stores can be accessed successfully.

[root@tidb-cluster-tikv-0 /]# curl http://tidb-cluster-pd:2379/pd/api/v1/stores
{
“count”: 3,
“stores”: [
{
“store”: {
“id”: 2001,
“address”: “tidb-cluster-tikv-0.tidb-cluster-tikv-peer.tidb-cluster.svc:20160”,
“version”: “7.1.2”,
“peer_address”: “tidb-cluster-tikv-0.tidb-cluster-tikv-peer.tidb-cluster.svc:20160”,
“status_address”: “tidb-cluster-tikv-0.tidb-cluster-tikv-peer.tidb-cluster.svc:20180”,
“git_hash”: “8632b3952d931e510d00953f89477ce095b3d902”,
“start_timestamp”: 1703756506,
“deploy_path”: “/”,
“last_heartbeat”: 1703812921932741743,
“node_state”: 1,
“state_name”: “Down”
},
“status”: {
“capacity”: “0B”,
“available”: “0B”,
“used_size”: “0B”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 6,
“region_weight”: 1,
“region_score”: 0,
“region_size”: 0,
“start_ts”: “2023-12-28T17:41:46+08:00”,
“last_heartbeat_ts”: “2023-12-29T09:22:01.932741743+08:00”,
“uptime”: “15h40m15.932741743s”
}
},
{
“store”: {
“id”: 2004,
“address”: “tidb-cluster-tikv-2.tidb-cluster-tikv-peer.tidb-cluster.svc:20160”,
“version”: “7.1.2”,
“peer_address”: “tidb-cluster-tikv-2.tidb-cluster-tikv-peer.tidb-cluster.svc:20160”,
“status_address”: “tidb-cluster-tikv-2.tidb-cluster-tikv-peer.tidb-cluster.svc:20180”,
“git_hash”: “8632b3952d931e510d00953f89477ce095b3d902”,
“start_timestamp”: 1703756505,
“deploy_path”: “/”,
“last_heartbeat”: 1703812921542561872,
“node_state”: 1,
“state_name”: “Down”
},
“status”: {
“capacity”: “0B”,
“available”: “0B”,
“used_size”: “0B”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 6,
“region_weight”: 1,
“region_score”: 0,
“region_size”: 0,
“start_ts”: “2023-12-28T17:41:45+08:00”,
“last_heartbeat_ts”: “2023-12-29T09:22:01.542561872+08:00”,
“uptime”: “15h40m16.542561872s”
}
},
{
“store”: {
“id”: 3001,
“address”: “tidb-cluster-tikv-1.tidb-cluster-tikv-peer.tidb-cluster.svc:20160”,
“version”: “7.1.2”,
“peer_address”: “tidb-cluster-tikv-1.tidb-cluster-tikv-peer.tidb-cluster.svc:20160”,
“status_address”: “tidb-cluster-tikv-1.tidb-cluster-tikv-peer.tidb-cluster.svc:20180”,
“git_hash”: “8632b3952d931e510d00953f89477ce095b3d902”,
“start_timestamp”: 1703756506,
“deploy_path”: “/”,
“last_heartbeat”: 1703812921870843688,
“node_state”: 1,
“state_name”: “Down”
},
“status”: {
“capacity”: “0B”,
“available”: “0B”,
“used_size”: “0B”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 6,
“region_weight”: 1,
“region_score”: 0,
“region_size”: 0,
“start_ts”: “2023-12-28T17:41:46+08:00”,
“last_heartbeat_ts”: “2023-12-29T09:22:01.870843688+08:00”,
“uptime”: “15h40m15.870843688s”
}
}
]
}

| username: dba远航 | Original post link

First, test if the connection is normal, then confirm if SSH can connect.

| username: 哈喽沃德 | Original post link

It may be caused by the inability of the TiKV node to communicate with the PD node.

  1. Execute ping tidb-cluster-pd on the TiKV node to confirm whether the PD node can be pinged.
  2. Execute telnet tidb-cluster-pd 2379 on the TiKV node to confirm whether a TCP connection to the PD node can be established.
  3. Check the firewall settings to ensure that the firewall is not blocking network connections between nodes.
  4. Check the PD node logs to confirm whether there is any relevant error information.
| username: zhang_2023 | Original post link

First, check if the network is connected.

| username: yulei7633 | Original post link

Check the network and SSH.

| username: simonzhang | Original post link

Executing curl http://tidb-cluster-pd:2379/pd/api/v1/stores on the TiKV node receives a normal response, indicating that there are no network issues.

| username: simonzhang | Original post link

On the surface, HTTP is working, but the gRPC call is timing out.

| username: come_true | Original post link

First check the network, then check the protocol, and finally check the configuration files.

| username: 像风一样的男子 | Original post link

Are the network components of k8s running normally?
Very few people in the community use k8s to install TiDB, so if there are issues, you need to consult k8s experts.

| username: 双开门变频冰箱 | Original post link

Check if the firewall is not turned off or if the port is not open.

| username: Miracle | Original post link

Execute time dig tidb-cluster-pd inside the tikv container and see how long it takes.

| username: kelvin | Original post link

Check the network and SSH.

| username: seiang | Original post link

It feels like a firewall port restriction.

| username: 不想干活 | Original post link

Check if the firewall is blocking it.

| username: andone | Original post link

Check both the network and SSH.

| username: 哈喽沃德 | Original post link

Try troubleshooting it using my method first.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.