TiKV 7.5.1 Unexpected Restart

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv 7.5.1 异常重启

| username: 田帅萌7

【TiDB Usage Environment】Production Environment / Testing / Poc
【TiDB Version】7.5.1
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Issue Phenomenon and Impact】
【Resource Configuration】Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots / Logs / Monitoring】

The abnormal TiKV restart log is as follows:
[2024/05/04 17:25:35.249 +08:00] [FATAL] [lib.rs:510] [“called Result::unwrap() on an Err value: Custom { kind: Uncategorized, error: "fdatasync" }”] [backtrace=" 0: tikv_util::set_panic_hook::{{closure}}\n at /workspace/source/tikv/components/tikv_util/src/lib.rs:509:18\n 1: <alloc::boxed::Box<F,A> as core::ops::function::Fn>::call\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2032:9\n std::panicking::rust_panic_with_hook\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:692:13\n 2: std::panicking::begin_panic_handler::{{closure}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:579:13\n 3: std::sys_common::backtrace::__rust_end_short_backtrace\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:137:18\n 4: rust_begin_unwind\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:575:5\n 5: core::panicking::panic_fmt\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:65:14\n 6: core::result::unwrap_failed\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:1791:5\n 7: core::result::Result<T,E>::unwrap\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:1113:23\n raft_engine::file_pipe_log::log_file::LogFileWriter::sync\n at /workspace/.cargo/git/checkouts/raft-engine-35ec7b0b2c07ddd2/e505d63/src/file_pipe_log/log_file.rs:125:9\n 8: raft_engine::file_pipe_log::pipe::SinglePipe::sync\n at /workspace/.cargo/git/checkouts/raft-engine-35ec7b0b2c07ddd2/e505d63/src/file_pipe_log/pipe.rs:401:9\n <raft_engine::file_pipe_log::pipe::DualPipes as raft_engine::pipe_log::PipeLog>::sync\n at /workspace/.cargo/git/checkouts/raft-engine-35ec7b0b2c07ddd2/e505d63/src/file_pipe_log/pipe.rs:511:9\n raft_engine::engine::Engine<F,P>::write\n at /workspace/.cargo/git/checkouts/raft-engine-35ec7b0b2c07ddd2/e505d63/src/engine.rs:177:21\n 9: <raft_log_engine::engine::RaftLogEngine as engine_traits::raft_engine::RaftEngine>::consume_and_shrink\n at /workspace/source/tikv/components/raft_log_engine/src/engine.rs:671:9\n 10: raftstore::store::async_io::write::Worker<EK,ER,N,T>::write_to_db\n at /workspace/source/tikv/components/raftstore/src/store/async_io/write.rs:754:17\n 11: raftstore::store::async_io::write::Worker<EK,ER,N,T>::run\n at /workspace/source/tikv/components/raftstore/src/store/async_io/write.rs:655:13\n raftstore::store::async_io::write::StoreWriters<EK,ER>::increase_to::{{closure}}::{{closure}}\n at /workspace/source/tikv/components/raftstore/src/store/async_io/write.rs:1036:33\n <std::thread::Builder as tikv_util::sys::thread::StdThreadBuildWrapper>::spawn_wrapper::{{closure}}\n at /workspace/source/tikv/components/tikv_util/src/sys/thread.rs:438:13\n std::sys_common::backtrace::rust_begin_short_backtrace\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:121:18\n 12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:551:17\n <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:271:9\n std::panicking::try::do_call\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:483:40\n std::panicking::try\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:447:19\n std::panic::catch_unwind\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:137:14\n std::thread::Builder::spawn_unchecked::{{closure}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:550:30\n core::ops::function::FnOnce::call_once{{vtable.shim}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:513:5\n 13: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2000:9\n <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2000:9\n std::sys::unix::thread::thread::new::thread_start\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/st
[2024/05/04 17:26:00.663 +08:00] [INFO] [lib.rs:88] [“Welcome to TiKV”] [thread_id=0x5]

| username: xfworld | Original post link

It looks like a disk flush failure, this error is quite serious.

| username: WalterWj | Original post link

I think you can check the system logs. Is the disk damaged?

| username: 田帅萌7 | Original post link

dmesg shows no obvious errors. I’ll ask the IDC colleague to take a look.

| username: WalterWj | Original post link

Has anyone from IDC discovered anything?

| username: 田帅萌7 | Original post link

No~ Let’s keep observing.

| username: 田帅萌7 | Original post link

They have given the logs to the vendor, both hardware and system, but nothing has been identified. Let’s keep observing for now. The kernel-level logs haven’t been printed; we’ll enable them later.

| username: WalterWj | Original post link

Alright, let me see if I can get a KV R&D teacher to look at the error stack and determine if it is also hardware-triggered.

| username: 田帅萌7 | Original post link

No rush. It hasn’t reported any errors for a long time… It probably isn’t an issue with our TiDB.

| username: WalterWj | Original post link

Excellent.

| username: pingyu | Original post link

libc::fdatasync failed, it should be a disk issue.

The panic location is here:

It calls fdatasync inside: