Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: Error: failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s
【TiDB Usage Environment】 Testing environment
【TiDB Version】v6.1.0
【Encountered Problem】tiup deployment, tiflash failed to start
【Reproduction Path】tiup cluster start
【Problem Phenomenon and Impact】
Please share the tiflash_error.log file for review.
Check the ports and firewall.
The files generated under the /tidb-deploy/tiflash-9000 directory increased rapidly, taking up my entire 100G hard drive. Later, I deleted all the files in this folder, and now after starting, nothing is being written into it. Could you please explain what the contents of the files in this folder are, and how to configure it to continue writing logs here? I can’t find the tiflash_error.log anymore.
Create an empty tiflash_error.log
file in the original directory, and it should work.
d6634ee9e75d26", “func ”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute ”, “hit”: false}
2022-08-22T09:58:19.649+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2022-08-22T09:58:19.649+0800 DEBUG TaskFinish {“task”: “StartCluster”, “error”: “failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s
github.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute
\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91
github.com/pingcap/tiup/pkg/cluster/spec.PortStarted
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116
github.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready
\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:803
github.com/pingcap/tiup/pkg/cluster/operation.startInstance
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:404
github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:533
The Go Programming Language
\tgolang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74
runtime.goexit
\truntime/asm_amd64.s:1571
failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.
failed to start tiflash”}
2022-08-22T09:58:19.649+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s
github.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute
\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91
github.com/pingcap/tiup/pkg/cluster/spec.PortStarted
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116
github.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready
\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:803
github.com/pingcap/tiup/pkg/cluster/operation.startInstance
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:404
github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:533
The Go Programming Language
\tgolang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74
runtime.goexit
\truntime/asm_amd64.s:1571
failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.
failed to start tiflash”}
All files in /tidb-deploy/tiflash-9000 have been deleted, and the TiFlash node cannot be removed. How can this be resolved?
This is the error log file. Could the experts please take a look and see why TiFlash failed to start?
The number of files in the TiFlash deployment directory is increasing, and they don’t get deleted by themselves, causing the hard drive to fill up quickly. Does anyone know what’s going on? Is there a parameter that needs to be set to make it automatically delete unused files?
From the error message, it appears that TiFlash crashed during startup:
[2022/08/22 14:53:58.426 +08:00] [ERROR] [BaseDaemon.cpp:420] ["BaseDaemon:Attempted access has violated the permissions assigned to the memory area."] [thread_id=5]
[2022/08/22 14:53:59.947 +08:00] [ERROR] [BaseDaemon.cpp:570] ["BaseDaemon:\
0x1ed2661\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+32319073]\
\tlibs/libdaemon/src/BaseDaemon.cpp:221\
0x7f2cbae9f5d0\t<unknown symbol> [libpthread.so.0+62928]\
0x85d11e0\tgrpc_server_request_registered_call [tiflash+140317152]\
\tcontrib/grpc/src/core/lib/surface/server.cc:0\
0x855fbb6\tgrpc::ServerInterface::RegisteredAsyncRequest::IssueRequest(void*, grpc_byte_buffer**, grpc_impl::ServerCompletionQueue*) [tiflash+139852726]\
\tcontrib/grpc/src/cpp/server/server_cc.cc:209\
0x7ac6f6f\tgrpc::ServerInterface::PayloadAsyncRequest<mpp::EstablishMPPConnectionRequest>::PayloadAsyncRequest(grpc::internal::RpcServiceMethod*, grpc::ServerInterface*, grpc_impl::ServerContext*, grpc::internal::ServerAsyncStreamingInterface*, grpc_impl::CompletionQueue*, grpc_impl::ServerCompletionQueue*, void*, mpp::EstablishMPPConnectionRequest*) [tiflash+128741231]\
\tcontrib/grpc/include/grpcpp/impl/codegen/server_interface.h:270\
0x7ac5a40\tDB::EstablishCallData::EstablishCallData(DB::AsyncFlashService*, grpc_impl::ServerCompletionQueue*, grpc_impl::ServerCompletionQueue*, std::__1::shared_ptr<std::__1::atomic<bool> > const&) [tiflash+128735808]\
\tdbms/src/Flash/EstablishCall.cpp:34\
0x7ac5d3b\tDB::EstablishCallData::spawn(DB::AsyncFlashService*, grpc_impl::ServerCompletionQueue*, grpc_impl::ServerCompletionQueue*, std::__1::shared_ptr<std::__1::atomic<bool> > const&) [tiflash+128736571]\
\tdbms/src/Flash/EstablishCall.cpp:44\
0x1d638c5\tDB::Server::FlashGrpcServerHolder::FlashGrpcServerHolder(DB::Server&, DB::TiFlashRaftConfig const&, Poco::Logger*) [tiflash+30816453]\
\tdbms/src/Server/Server.cpp:643\
0x1d5ab9e\tDB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) [tiflash+30780318]\
\tdbms/src/Server/Server.cpp:1401\
0x7fe644a\tPoco::Util::Application::run() [tiflash+134112330]\
\tcontrib/poco/Util/src/Application.cpp:335\
0x7ff5c0c\tPoco::Util::ServerApplication::run(int, char**) [tiflash+134175756]\
\tcontrib/poco/Util/src/ServerApplication.cpp:618\
0x1d5e4ad\tmainEntryClickHouseServer(int, char**) [tiflash+30794925]\
\tdbms/src/Server/Server.cpp:1549\
0x1d1061e\tmain [tiflash+30475806]\
\tdbms/src/Server/main.cpp:167\
0x7f2cba8cf495\t__libc_start_main [libc.so.6+140437]"] [thread_id=5]
Could you please provide your hardware architecture and operating system information?
The CPU is 4 cores and 8GB, is this configuration too low? What is the minimum configuration for a testing environment?
Another issue is that TiFlash fails to start and keeps writing files to its deployment directory, filling up my 100GB hard drive in no time.
A temporary solution is to prevent TiFlash from generating core dump files.
TiFlash crashes every time it starts, and each crash generates a core dump file. You can search how to globally disable core dumps on CentOS so that it won’t fill up your disk.
Is there a parameter similar to abort-on-panic
in TiFlash? I didn’t see it mentioned in the official documentation,
了解 TiKV 的配置文件参数。
TiFlash crashes will automatically generate a core dump. Is there currently no parameter to control this from TiFlash?
The generation of core dumps is mainly controlled by the operating system (ulimit), and TiFlash will always abort on panic.
How do you set this parameter?
You can specifically search for ulimit -c 0
/ core dump.
Additionally, is there any plan to add a parameter to control abort on panic in TiFlash? Similar to TiKV, where it can be controlled at the OS level, it would be great if TiFlash could also control whether to generate a core.
You can raise an issue at Sign in to GitHub · GitHub