Error: failed to start TiFlash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance's log (/tidb-deploy/tiflash-9000/log) for more details: timed out waiting for port 9000 to be started after 2m0s

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Error: failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s

| username: Jackhzf

【TiDB Usage Environment】 Testing environment
【TiDB Version】v6.1.0
【Encountered Problem】tiup deployment, tiflash failed to start
【Reproduction Path】tiup cluster start
【Problem Phenomenon and Impact】

| username: wuxiangdong | Original post link

Please share the tiflash_error.log file for review.

| username: h5n1 | Original post link

Check the ports and firewall.

| username: Jackhzf | Original post link

The files generated under the /tidb-deploy/tiflash-9000 directory increased rapidly, taking up my entire 100G hard drive. Later, I deleted all the files in this folder, and now after starting, nothing is being written into it. Could you please explain what the contents of the files in this folder are, and how to configure it to continue writing logs here? I can’t find the tiflash_error.log anymore.

| username: cheng | Original post link

Create an empty tiflash_error.log file in the original directory, and it should work.

| username: Jackhzf | Original post link

d6634ee9e75d26", “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute”, “hit”: false}
2022-08-22T09:58:19.649+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2022-08-22T09:58:19.649+0800 DEBUG TaskFinish {“task”: “StartCluster”, “error”: “failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s
github.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute
\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91
github.com/pingcap/tiup/pkg/cluster/spec.PortStarted
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116
github.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready
\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:803
github.com/pingcap/tiup/pkg/cluster/operation.startInstance
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:404
github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:533
The Go Programming Language
\tgolang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74
runtime.goexit
\truntime/asm_amd64.s:1571
failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.
failed to start tiflash”}
2022-08-22T09:58:19.649+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s
github.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute
\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91
github.com/pingcap/tiup/pkg/cluster/spec.PortStarted
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116
github.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready
\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:803
github.com/pingcap/tiup/pkg/cluster/operation.startInstance
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:404
github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:533
The Go Programming Language
\tgolang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74
runtime.goexit
\truntime/asm_amd64.s:1571
failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.
failed to start tiflash”}

| username: Jackhzf | Original post link

All files in /tidb-deploy/tiflash-9000 have been deleted, and the TiFlash node cannot be removed. How can this be resolved?

| username: Jackhzf | Original post link

This is the error log file. Could the experts please take a look and see why TiFlash failed to start?

| username: Jackhzf | Original post link

The number of files in the TiFlash deployment directory is increasing, and they don’t get deleted by themselves, causing the hard drive to fill up quickly. Does anyone know what’s going on? Is there a parameter that needs to be set to make it automatically delete unused files?

| username: wish-PingCAP | Original post link

From the error message, it appears that TiFlash crashed during startup:

[2022/08/22 14:53:58.426 +08:00] [ERROR] [BaseDaemon.cpp:420] ["BaseDaemon:Attempted access has violated the permissions assigned to the memory area."] [thread_id=5]
[2022/08/22 14:53:59.947 +08:00] [ERROR] [BaseDaemon.cpp:570] ["BaseDaemon:\
       0x1ed2661\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+32319073]\
                \tlibs/libdaemon/src/BaseDaemon.cpp:221\
  0x7f2cbae9f5d0\t<unknown symbol> [libpthread.so.0+62928]\
       0x85d11e0\tgrpc_server_request_registered_call [tiflash+140317152]\
                \tcontrib/grpc/src/core/lib/surface/server.cc:0\
       0x855fbb6\tgrpc::ServerInterface::RegisteredAsyncRequest::IssueRequest(void*, grpc_byte_buffer**, grpc_impl::ServerCompletionQueue*) [tiflash+139852726]\
                \tcontrib/grpc/src/cpp/server/server_cc.cc:209\
       0x7ac6f6f\tgrpc::ServerInterface::PayloadAsyncRequest<mpp::EstablishMPPConnectionRequest>::PayloadAsyncRequest(grpc::internal::RpcServiceMethod*, grpc::ServerInterface*, grpc_impl::ServerContext*, grpc::internal::ServerAsyncStreamingInterface*, grpc_impl::CompletionQueue*, grpc_impl::ServerCompletionQueue*, void*, mpp::EstablishMPPConnectionRequest*) [tiflash+128741231]\
                \tcontrib/grpc/include/grpcpp/impl/codegen/server_interface.h:270\
       0x7ac5a40\tDB::EstablishCallData::EstablishCallData(DB::AsyncFlashService*, grpc_impl::ServerCompletionQueue*, grpc_impl::ServerCompletionQueue*, std::__1::shared_ptr<std::__1::atomic<bool> > const&) [tiflash+128735808]\
                \tdbms/src/Flash/EstablishCall.cpp:34\
       0x7ac5d3b\tDB::EstablishCallData::spawn(DB::AsyncFlashService*, grpc_impl::ServerCompletionQueue*, grpc_impl::ServerCompletionQueue*, std::__1::shared_ptr<std::__1::atomic<bool> > const&) [tiflash+128736571]\
                \tdbms/src/Flash/EstablishCall.cpp:44\
       0x1d638c5\tDB::Server::FlashGrpcServerHolder::FlashGrpcServerHolder(DB::Server&, DB::TiFlashRaftConfig const&, Poco::Logger*) [tiflash+30816453]\
                \tdbms/src/Server/Server.cpp:643\
       0x1d5ab9e\tDB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) [tiflash+30780318]\
                \tdbms/src/Server/Server.cpp:1401\
       0x7fe644a\tPoco::Util::Application::run() [tiflash+134112330]\
                \tcontrib/poco/Util/src/Application.cpp:335\
       0x7ff5c0c\tPoco::Util::ServerApplication::run(int, char**) [tiflash+134175756]\
                \tcontrib/poco/Util/src/ServerApplication.cpp:618\
       0x1d5e4ad\tmainEntryClickHouseServer(int, char**) [tiflash+30794925]\
                \tdbms/src/Server/Server.cpp:1549\
       0x1d1061e\tmain [tiflash+30475806]\
                \tdbms/src/Server/main.cpp:167\
  0x7f2cba8cf495\t__libc_start_main [libc.so.6+140437]"] [thread_id=5]

Could you please provide your hardware architecture and operating system information?

| username: Jackhzf | Original post link

The CPU is 4 cores and 8GB, is this configuration too low? What is the minimum configuration for a testing environment?
Another issue is that TiFlash fails to start and keeps writing files to its deployment directory, filling up my 100GB hard drive in no time.

| username: ShawnYan | Original post link

A temporary solution is to prevent TiFlash from generating core dump files.

| username: wish-PingCAP | Original post link

TiFlash crashes every time it starts, and each crash generates a core dump file. You can search how to globally disable core dumps on CentOS so that it won’t fill up your disk.

| username: ShawnYan | Original post link

Is there a parameter similar to abort-on-panic in TiFlash? I didn’t see it mentioned in the official documentation,

TiFlash crashes will automatically generate a core dump. Is there currently no parameter to control this from TiFlash?

| username: wish-PingCAP | Original post link

The generation of core dumps is mainly controlled by the operating system (ulimit), and TiFlash will always abort on panic.

| username: Jackhzf | Original post link

How do you set this parameter?

| username: ShawnYan | Original post link

You can specifically search for ulimit -c 0 / core dump.

| username: ShawnYan | Original post link

Additionally, is there any plan to add a parameter to control abort on panic in TiFlash? Similar to TiKV, where it can be controlled at the OS level, it would be great if TiFlash could also control whether to generate a core.

| username: flow-PingCAP | Original post link

You can raise an issue at Sign in to GitHub · GitHub

| username: ShawnYan | Original post link

Raise one issue to record, Support "abort on panic" in global settings · Issue #5946 · pingcap/tiflash · GitHub