Tiflash not getting started

Application environment:

Test

TiDB version: v7.1.0

Reproduction method:

Simulate production deployment on a single machine
Follow the below link

Problem:

We are testing the Tidb As part of it while installing the Tidb cluster using tiup on a single machine

Tiflash is not getting started

####lError logs: + [ Serial ] - StartCluster
Starting component pd
Starting instance 172.178.79.212:2379
Start instance 172.178.79.212:2379 success
Starting component tikv
Starting instance 172.178.79.212:20162
Starting instance 172.178.79.212:20160
Starting instance 172.178.79.212:20161
Start instance 172.178.79.212:20162 success
Start instance 172.178.79.212:20160 success
Start instance 172.178.79.212:20161 success
Starting component tidb
Starting instance 172.178.79.212:4000
Start instance 172.178.79.212:4000 success
Starting component tiflash
Starting instance 172.178.79.212:9000

Error: failed to start tiflash: failed to start: 172.178.79.212 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 3930 to be started after 2m0s

Please find the tiflash_tikv.log and tiflash.log file attached.
tiflash_tikv.log (54.1 KB)
tiflash.log (379.5 KB)

because of the tiflash failure we are not able to use Tidb.

Appreciate help in resolving this issue

Thanks in advance

Based on the error logs you provided, it seems that TiFlash failed to start on the specified instance. The error message indicates a timeout waiting for port 3930 to be started after 2 minutes.

To troubleshoot this issue, you can follow the steps below:

  1. Check the TiFlash log file (/tidb-deploy/tiflash-9000/log) for more detailed error messages. The log file you provided, tiflash.log, might contain useful information about the failure.

  2. Make sure that the TiFlash instance is properly configured and all necessary dependencies are installed. Refer to the TiDB documentation for the required system libraries and configurations for TiFlash.

  3. Verify that the port 3930 is not being used by any other process on the machine. You can use the netstat command to check the port status.

  4. Check the system resource usage (CPU, memory, disk space) to ensure that there are no resource constraints causing the TiFlash startup failure.

If the above steps do not resolve the issue, you can try the following:

  1. Upgrade your TiDB cluster to the latest version. There might be bug fixes or improvements related to TiFlash in newer versions.

  2. Contact PingCAP support or the community for further assistance. Provide them with the TiFlash log files and any other relevant information to help diagnose and resolve the issue.

Please note that the information provided here is based on the official documentation. If you encounter any issues or need further assistance, it is recommended to consult the official documentation or seek help from the PingCAP support or community.

There are two ERRORS in the log file:

$ grep ERROR tiflash.log 
[2023/12/28 04:54:29.544 +00:00] [ERROR] [Server.cpp:381] ["/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/server/insecure/server_chttp2.cc, line number: 48, log msg : {\"created\":\"@1703739269.544447309\",\"description\":\"No address added out of total 1 resolved\",\"file\":\"/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc\",\"file_line\":936,\"referenced_errors\":[{\"created\":\"@1703739269.544439009\",\"description\":\"Unable to configure socket\",\"fd\":34,\"file\":\"/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc\",\"file_line\":218,\"referenced_errors\":[{\"created\":\"@1703739269.544432108\",\"description\":\"Cannot assign requested address\",\"errno\":99,\"file\":\"/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc\",\"file_line\":191,\"os_error\":\"Cannot assign requested address\",\"syscall\":\"bind\"}]}]}"] [source=grpc] [thread_id=1]
[2023/12/28 04:54:31.278 +00:00] [ERROR] [<unknown>] ["DB::Exception: Exception happens when start grpc server, the flash.service_addr may be invalid, flash.service_addr is 172.178.79.212:3930"] [source=Application] [thread_id=1]

When formatting the first error a bit it looks like this:

{'created': '@1703739269.544447309',
 'description': 'No address added out of total 1 resolved',
 'file': '/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc',
 'file_line': 936,
 'referenced_errors': [{'created': '@1703739269.544439009',
   'description': 'Unable to configure socket',
   'fd': 34,
   'file': '/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc',
   'file_line': 218,
   'referenced_errors': [{'created': '@1703739269.544432108',
     'description': 'Cannot assign requested address',
     'errno': 99,
     'file': '/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc',
     'file_line': 191,
     'os_error': 'Cannot assign requested address',
     'syscall': 'bind'}]}]}

Could it be that something else is already listening on that port (3930 or 9000)?

Hi dveenden

Thanks for the reply

I checked and there are no services running on ports 3930 and 9000

[mobius-prod@tidb-vm ~]$ sudo ss -tulpn | grep :3930
[mobius-prod@tidb-vm ~]$ sudo ss -tulpn | grep :9000
[mobius-prod@tidb-vm ~]$

Could you try this:

#!/bin/python3
import socket

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind(('172.178.79.212', 3930))
    print(f"bind ok: {s}")

This tries to do the same operation. Maybe it is a permission issue or a typo in the IP address?

What OS are you using?