5.0.1 Cluster: Restarting 3 TiKV Nodes Simultaneously (Not OOM)

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 5.0.1 集群 3台tikv 同一时间重启(不是oom)

| username: TiDBer_yyy

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.0.1
[Reproduction Path] Not reproduced
[Encountered Problem: Problem Phenomenon and Impact]
The cluster has a total of 8 TiKV nodes, and the restart of 3 TiKV nodes caused the cluster to be unavailable.

Error log:

[2023/07/29 07:10:02.331 +08:00] [FATAL] [lib.rs:465] ["index out of bounds: the len is 6 but the index is 6"] [backtrace="stack backtrace:\n   0: tikv_util::set_panic_hook:
:{{closure}}\n             at /home/jenkins/agent/workspace/build_tikv_multi_branch_v5.0.1/tikv/components/tikv_util/src/lib.rs:464\n   1: std::panicking::rust_panic_with_ho
ok\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:595\n   2: std::panicking::begin_panic_handler::{{closure}}\n             a
t /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:497\n   3: std::sys_common::backtrace::__rust_end_short_backtrace\n             at /rustc/bc3
9d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys_common/backtrace.rs:141\n   4: rust_begin_unwind\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/
/library/std/src/panicking.rs:493\n   5: core::panicking::panic_fmt\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:92\n   6:
 core::panicking::panic_bounds_check\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:69\n   7: <usize as core::slice::index::
SliceIndex<[T]>>::index_mut\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:188\n      core::slice::index::<impl core::ops::
index::IndexMut<I> for [T]>::index_mut\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:26\n      <alloc::vec::Vec<T,A> as co
re::ops::index::IndexMut<I>>::index_mut\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/vec/mod.rs:2054\n      tokio_timer::wheel::Wheel<T
>::insert\n             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114\n      tokio_timer::timer::Timer<T,N>::add_entry\n
    at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:324\n   8: tokio_timer::timer::Timer<T,N>::process_queue\n             at /rust/reg
istry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:301\n   9: <tokio_timer::timer::Timer<T,N> as tokio_executor::park::Park>::park\n             at /r
ust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:361\n      tokio_timer::timer::Timer<T,N>::turn\n             at /rust/registry/src/github.c
om-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:256\n  10: tikv_util::timer::start_global_timer::{{closure}}\n             at /home/jenkins/agent/workspace/build_tik
v_multi_branch_v5.0.1/tikv/components/tikv_util/src/timer.rs:95\n  11: std::sys_common::backtrace::__rust_begin_short_backtrace\n             at /rustc/bc39d4d9c514e5fdb40a5
782e6ca08924f979c35/library/std/src/sys_common/backtrace.rs:125\n  12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n             at /rustc/bc39d4d9c514e5
fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:474\n  13: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once\n             at /rustc/b
c39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:322\n  14: std::panicking::try::do_call\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/lib
rary/std/src/panicking.rs:379\n      std::panicking::try\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panicking.rs:343\n      std::panic:
:catch_unwind\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:396\n      std::thread::Builder::spawn_unchecked::{{closure}}\n
       at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:473\n      core::ops::function::FnOnce::call_once{{vtable.shim}}\n             at /ru
stc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/ops/function.rs:227\n  15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n
   at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484\n      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n
       at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484\n      std::sys::unix::thread::Thread::new::thread_start\n             at /rustc/bc
39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys/unix/thread.rs:71\n  16: start_thread\n  17: __clone\n"] [location=/rust/registry/src/github.com-1ecc6299db9ec823
/tokio-timer-0.2.13/src/wheel/mod.rs:114] [thread_name=timer]
| username: 扬仔_tidb | Original post link

Are there no other error logs? Why are there still Jenkins logs?

| username: tidb菜鸟一只 | Original post link

Have these three nodes not been restarted for 2 years?
TiKV running over 2 years may panic · Issue #11940 · tikv/tikv (github.com)

| username: TiDBer_yyy | Original post link

It is possible. I don’t know if TiKV has an uptime monitoring chart.
Found it: TiKV 监控指标详解 | PingCAP 文档中心

| username: cassblanca | Original post link

This isn’t a TiDB log, is it? Why is there Jenkins build information?

| username: TiDBer_yyy | Original post link

TiKV logs

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.