Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 生产环境tikv异常重启
[TiDB Usage Environment] Production Environment
[TiDB Version] 3.0.8
[Encountered Problem: Symptoms and Impact]
Today, the memory of two TiKV nodes dropped significantly. Upon inspection, it was found that the TiKV services on both servers had restarted at 22:02:00 on November 15th.
[Resource Configuration] 5kv-2server-3pd
[Attachments: Screenshots/Logs/Monitoring]
tidb.log of the tidb node:
Problematic 103 tikv server:
The FATAL log is:
[2022/11/15 22:01:36.060 +08:00] [FATAL] [lib.rs:499] ["index out of bounds: the len is 6 but the index is 6"] [backtrace="stack backtrace:\n 0: 0x56119bd6978d - backtrace::backtrace::libunwind::trace::h958f5f3eb75b2917\n at /rust/registry/src/github.com-1ecc6299db9ec823/backtrace-0.2.3/src/backtrace/libunwind.rs:54\n - backtrace::backtrace::trace::hdf994f7eb3c12b81\n at /rust/registry/src/github.com-1ecc6299db9ec823/backtrace-0.2.3/src/backtrace/mod.rs:70\n 1: 0x56119bd5f490 - tikv_util::set_panic_hook::{{closure}}::hf6c0260b2e4aea39\n at /home/jenkins/.target/release/build/backtrace-e20a32a05fd0b8fe/out/capture.rs:79\n 2: 0x56119bf090ff - std::panicking::rust_panic_with_hook::h8d2408723e9a2bd4\n at src/libstd/panicking.rs:479\n 3: 0x56119bf08edd - std::panicking::continue_panic_fmt::hb2aaa9386c4e5e80\n at src/libstd/panicking.rs:382\n 4: 0x56119bf18745 - rust_begin_unwind\n at src/libstd/panicking.rs:309\n 5: 0x56119bf232eb - core::panicking::panic_fmt::h79e840586f23493b\n at src/libcore/panicking.rs:85\n 6: 0x56119bf22df3 - core::panicking::panic_bounds_check::h9293ee7846bbb139\n at src/libcore/panicking.rs:61\n 7: 0x56119bd675fe - <usize as core::slice::SliceIndex<[T]>>::index_mut::h8595bb10a522f6ec\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libcore/slice/mod.rs:2700\n - core::slice::<impl core::ops::index::IndexMut<I> for [T]>::index_mut::h72f80a2aa264d71e\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libcore/slice/mod.rs:2561\n - <alloc::vec::Vec<T> as core::ops::index::IndexMut<I>>::index_mut::hf0b55e5b5c64c8b5\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/liballoc/vec.rs:1768\n - tokio_timer::wheel::Wheel<T>::insert::ha0811b20f18170ea\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.12/src/wheel/mod.rs:114\n - tokio_timer::timer::Timer<T,N>::add_entry::h417301f6c9308779\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.12/src/timer/mod.rs:324\n 8: 0x56119bd666bf - tokio_timer::timer::Timer<T,N>::process_queue::h39928e46093b552d\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.12/src/timer/mod.rs:0\n 9: 0x56119bd65e0b - <tokio_timer::timer::Timer<T,N> as tokio_executor::park::Park>::park::hdf0d553584b8cb22\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.12/src/timer/mod.rs:361\n - tokio_timer::timer::Timer<T,N>::turn::h67305fd216915f0e\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.12/src/timer/mod.rs:256\n 10: 0x56119bd654b3 - tikv_util::timer::start_global_timer::{{closure}}::h418812a1f8076ef5\n at components/tikv_util/src/timer.rs:94\n 11: 0x56119bd651e5 - std::sys_common::backtrace::__rust_begin_short_backtrace::h7a15993f24510b40\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libstd/sys_common/backtrace.rs:77\n 12: 0x56119bd651d5 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::hc908468389262d3d\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libstd/thread/mod.rs:470\n 13: 0x56119bd651c5 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hea2226c234dcd674\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libstd/panic.rs:309\n 14: 0x56119bd651b8 - std::panicking::try::do_call::h5ae108cfa09964f0\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libstd/panicking.rs:294\n - std::panicking::try::hb747e5c89d6c749b\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250//src/libpanic_abort/lib.rs:29\n - std::panic::catch_unwind::h9d095dd8e5bc9701\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libstd/panic.rs:388\n - std::thread::Builder::spawn_unchecked::{{closure}}::hd8b8a97aa6180ba0\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libstd/thread/mod.rs:469\n - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc4c2684293abcd20\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libcore/ops/function.rs:231\n 15: 0x56119bf1774e - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::he71721d2d956d451\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/liballoc/boxed.rs:746\n 16: 0x56119bf19a7b - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::he520045b8d28ce5c\n at /rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/liballoc/boxed.rs:746\n - std::sys_common::thread::start_thread::h2e98d1272dc6d74b\n at src/libstd/sys_common/thread.rs:13\n - std::sys::unix::thread::Thread::new::thread_start::h18485805666ccd3c\n at src/libstd/sys/unix/thread.rs:79\n 17: 0x7f42e149add4 - start_thread\n 18: 0x7f42e0ba202c - __clone\n 19: 0x0 - <unknown>"] [location=/rustc/0e4a56b4b04ea98bb16caada30cb2418dd06e250/src/libcore/slice/mod.rs:2700] [thread_name=timer]
[2022/11/15 22:02:01.278 +08:00] [INFO] [mod.rs:26] ["Welcome to TiKV."]
103 node message log:
Both restarted kv nodes have the same issue, attached monitoring:
The anomalies in the images are from the two problematic servers.
Could you please explain the reason for this? It’s a bit alarming that both servers encountered issues simultaneously.