Simulating Single-Machine Deployment of a Production Cluster: TiKV Unable to Connect to PD

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 单机模拟部署生产集群,TiKV无法连接PD

| username: TiDBer_QYr0vohO

[TiDB Usage Environment] Production Environment / Testing / Poc
Testing
[TiDB Version]
v7.1.0
[Reproduction Path] What operations were performed when the issue occurred
None, single-node deployment of TiDB cluster
[Encountered Issue: Issue Phenomenon and Impact]
When starting the cluster with tiup cluster start, the PD node started successfully, but from the logs, it can be seen that TiKV cannot connect to the PD node.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]




| username: Billmay表妹 | Original post link

Based on the provided information, it appears that TiKV is unable to connect to the PD node, possibly due to incorrect configuration of the PD node’s IP address or port number. It is recommended to troubleshoot by following these steps:

  1. Confirm whether the IP address and port number of the PD node are correctly configured. You can check the IP address and port number of the PD node using the following command:

    tiup cluster display <cluster-name>
    

    Here, <cluster-name> is the name of your deployed TiDB cluster.

  2. Confirm whether the IP address and port number of the PD node are correctly configured in the TiKV node’s configuration file. You can check the TiKV node’s configuration file using the following command:

    tiup cluster edit-config <cluster-name>
    

    Then find the TiKV node’s configuration file and confirm whether the IP address and port number of the PD node are correct.

  3. If the above two steps are confirmed to be correct, you can try restarting the TiKV node to see if it can connect to the PD node. You can restart the TiKV node using the following command:

    tiup cluster restart <cluster-name> tikv <tikv-index>
    

    Here, <cluster-name> is the name of your deployed TiDB cluster, and <tikv-index> is the index number of the TiKV node, which can be checked using the following command:

    tiup cluster display <cluster-name>
    

    If the TiKV node still cannot connect to the PD node after restarting, you can check the TiKV node’s log file to confirm the specific error information.

| username: TiDBer_QYr0vohO | Original post link

  1. tiup cluster display tidb-test
    PD’s IP and port: 192.168.150.100 2379/2380

  2. tiup cluster edit-config tidb-test
    image

  3. tiup cluster restart is stuck at TiKV. I opened another terminal and used telnet on port 2379, and I can see that PD has started normally.


    image

| username: dockerfile | Original post link

Check the PD logs first to resolve the PD startup issue.

| username: 有猫万事足 | Original post link

I think you should check the firewall settings.

Additionally, the fact that telnet 127.0.0.1 2379 works does not mean that telnet 192.168.150.100 2379 will work. You should try telnet 192.168.150.100 2379 to see if you can connect; it would be more convincing.

| username: TiDB_C罗 | Original post link

Is it possible that the disk space is insufficient?

| username: linnana | Original post link

It is very likely that the firewall is not turned off.

| username: TiDBer_QYr0vohO | Original post link

Okay, thank you. I just tried and it’s not due to the firewall. The port is open after starting PD. I just checked the PD logs and there is a memory-related error: panic: runtime error: invalid memory address or nil pointer dereference.

| username: TiDBer_QYr0vohO | Original post link

Indeed, PD started for a while and then reported an error.

| username: TiDBer_QYr0vohO | Original post link

PD started for a while, then reported an error and exited, causing TiKV to be unable to connect.

| username: TiDBer_QYr0vohO | Original post link

The disk is sufficient.

| username: TiDBer_QYr0vohO | Original post link

Thank you, everyone. It seems like there isn’t enough memory.

| username: Anna | Original post link

Hahaha, from now on, check 1 disk, 2 memory, 3 network, 4 firewall first.

| username: zhanggame1 | Original post link

Memory performance issues are worth checking. Use ‘top’ to see if the memory is fully utilized.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.