TiKV service automatically restarted

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv服务自动重启了

| username: furyamber

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.0.2
[Reproduction Path] Mostly idle
[Encountered Problem: Phenomenon and Impact] Received an alert at 18:42 in the evening, the tikv service crashed. Later, when I logged into the server to check, I found that the tikv service had automatically restarted after crashing. Upon checking the tikv.log, I found the following error:

[2023/09/12 18:42:10.139 +08:00] [FATAL] [lib.rs:465] [“index out of bounds: the len is 6 but the index is 6”] [backtrace="stack backtrace:
0: tikv_util::set_panic_hook::{{closure}}
at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/tikv_util/src/lib.rs:464
1: std::panicking::rust_panic_with_hook
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:595
2: std::panicking::begin_panic_handler::{{closure}}
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:497
3: std::sys_common::backtrace::__rust_end_short_backtrace
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys_common/backtrace.rs:141
4: rust_begin_unwind
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:493
5: core::panicking::panic_fmt
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:92
6: core::panicking::panic_bounds_check
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:69
7: <usize as core::slice::index::SliceIndex<[T]>>::index_mut
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:188
core::slice::index::<impl core::ops::index::IndexMut for [T]>::index_mut
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:26
<alloc::vec::Vec<T,A> as core::ops::index::IndexMut>::index_mut
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/vec/mod.rs:2054
tokio_timer::wheel::Wheel::insert
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114
tokio_timer::timer::Timer<T,N>::add_entry
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:324
8: tokio_timer::timer::Timer<T,N>::process_queue
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:301
9: <tokio_timer::timer::Timer<T,N> as tokio_executor::park::Park>::park
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:361
tokio_timer::timer::Timer<T,N>::turn
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:256
10: tikv_util::timer::start_global_timer::{{closure}}
at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/tikv_util/src/timer.rs:95
11: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/sys_common/backtrace.rs:125
12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:474
13: <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:322
14: std::panicking::try::do_call
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panicking.rs:379
std::panicking::try
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panicking.rs:343
std::panic::catch_unwind
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:396
std::thread::Builder::spawn_unchecked::{{closure}}
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:473
core::ops::function::FnOnce::call_once{{vtable.shim}}
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/ops/function.rs:227
15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484
std::sys::unix::thread::thread::new::thread_start
at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys/unix/thread.rs:71
16: start_thread
17: __clone
"] [location=/rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114] [thread_name=timer]

It seems like a bug. How should this problem be resolved?

| username: furyamber | Original post link

This is the TiKV log. The error occurred at 18:42.

| username: tidb菜鸟一只 | Original post link

Has TiKV not been restarted for 2 years? This is a known bug…
I suggest upgrading the version.

| username: furyamber | Original post link

Okay, thank you.

| username: DBRE | Original post link

I encountered this bug a while ago. You can perform a global scan and give a one-month advance warning to handle it in advance.

select INSTANCE, START_TIME, TIMESTAMPDIFF(day, START_TIME, now()) 
from information_schema.CLUSTER_INFO  
where type='tikv' and TIMESTAMPDIFF(day, START_TIME, now()) > 765;
| username: Mingdr | Original post link

We also encountered this problem. It took us a long time to find out it was a bug. :mask:

| username: cassblanca | Original post link

It hasn’t been restarted for two years, right? Known bug.