TiKV Restart

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv重启

| username: TiDBer_P49NMjIm

[TiDB Usage Environment] Production Environment / Test / Poc Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed to cause the issue
[Encountered Issue: Problem Phenomenon and Impact]
tikv restart

[2024/06/21 21:55:46.474 +03:00] [WARN] [endpoint.rs:606] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000DD5F69800000000000000203800000000002053603800000000E265CE6 lock_version: 450624525816037503 key: 7480000000000000DD5F698000000000000003038000000015281FFC03800000000002053603800000000000000003800000000000000003800000000000000003800000000E265CE6 lock_ttl: 3000 txn_size: 1 use_async_commit: true min_commit_ts: 450624525816037509”]
[2024/06/21 21:55:53.833 +03:00] [WARN] [endpoint.rs:606] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000DD5F69800000000000000203800000000002053603800000000E260410 lock_version: 450624527742795973 key: 7480000000000000DD5F69800000000000000303800000001528203A03800000000002053603800000000000000003800000000000000003800000000000000003800000000E260410 lock_ttl: 3000 txn_size: 1 use_async_commit: true min_commit_ts: 450624527742795974”]
[2024/06/21 21:55:56.222 +03:00] [WARN] [endpoint.rs:606] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000DD5F69800000000000000203800000000002053603800000000E260414 lock_version: 450624528371941463 key: 7480000000000000DD5F69800000000000000303800000001527F1F503800000000002053603800000000000000003800000000000000003800000000000000003800000000E260414 lock_ttl: 3000 txn_size: 1 use_async_commit: true min_commit_ts: 450624528371941464”]
[2024/06/21 21:55:58.324 +03:00] [WARN] [endpoint.rs:606] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000DD5F69800000000000000203800000000002053603800000000E260419 lock_version: 450624528922443934 key: 7480000000000000DD5F698000000000000003038000000015281FFB03800000000002053603800000000000000003800000000000000003800000000000000003800000000E260419 lock_ttl: 3000 txn_size: 1 use_async_commit: true min_commit_ts: 450624528922443935”]
[2024/06/21 21:55:59.526 +03:00] [WARN] [endpoint.rs:606] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000DD5F69800000000000000203800000000002053603800000000E26A8A1 lock_version: 450624529237016691 key: 7480000000000000DD5F69800000000000000303800000000A7A8C6A03800000000002053603800000000000000003800000000000000003800000000000000003800000000E26A8A1 lock_ttl: 3000 txn_size: 1 use_async_commit: true min_commit_ts: 450624529237016692”]
[2024/06/21 21:56:01.372 +03:00] [WARN] [endpoint.rs:606] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000DD5F69800000000000000203800000000002053603800000000E26041C lock_version: 450624529721983078 key: 7480000000000000DD5F698000000000000003038000000015281EDA03800000000002053603800000000000000003800000000000000003800000000000000003800000000E26041C lock_ttl: 3000 txn_size: 1 use_async_commit: true min_commit_ts: 450624529721983079”]
[2024/06/21 21:56:02.768 +03:00] [WARN] [endpoint.rs:606] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000DD5F69800000000000000203800000000002053603800000000E26A8A6 lock_version: 450624530088984628 key: 7480000000000000DD5F69800000000000000303800000001528155203800000000002053603800000000000000003800000000000000003800000000000000003800000000E26A8A6 lock_ttl: 3000 txn_size: 1 use_async_commit: true min_commit_ts: 450624530088984629”]

Error logs show a large number of errors, then the tikv node restarts,
Monitoring shows a large number of not leader errors

Then tikv restarts
[2024/06/21 21:56:12.399 +03:00] [INFO] [] [“New connected subchannel at 0x7efa158c3350 for subchannel 0x7efce064f540”]
[2024/06/21 21:56:12.414 +03:00] [FATAL] [lib.rs:465] [“index out of bounds: the len is 6 but the index is 6”] [backtrace=" 0: tikv_util::set_panic_hook::{{closure}}\n at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tikv_util/src/lib.rs:464:18\n 1: std::panicking::rust_panic_with_hook\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panicking.rs:626:17\n 2: std::panicking::begin_panic_handler::{{closure}}\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panicking.rs:519:13\n 3: std::sys_common::backtrace::__rust_end_short_backtrace\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/sys_common/backtrace.rs:141:18\n 4: rust_begin_unwind\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panicking.rs:515:5\n 5: core::panicking::panic_fmt\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/core/src/panicking.rs:92:14\n 6: core::panicking::panic_bounds_check\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/core/src/panicking.rs:69:5\n 7: <usize as core::slice::index::SliceIndex<[T]>>::index_mut\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/core/src/slice/index.rs:190:14\n core::slice::index::<impl core::ops::index::IndexMut for [T]>::index_mut\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/core/src/slice/index.rs:26:9\n <alloc::vec::Vec<T,A> as core::ops::index::IndexMut>::index_mut\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/alloc/src/vec/mod.rs:2445:9\n tokio_timer::wheel::Wheel::insert\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114:9\n tokio_timer::timer::Timer<T,N>::add_entry\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:324:15\n 8: tokio_timer::timer::Timer<T,N>::process_queue\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:301:21\n 9: <tokio_timer::timer::Timer<T,N> as tokio_executor::park::Park>::park\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:361:9\n tokio_timer::timer::Timer<T,N>::turn\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:256:21\n 10: tikv_util::timer::start_global_timer::{{closure}}\n at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tikv_util/src/timer.rs:98:17\n 11: std::sys_common::backtrace::__rust_begin_short_backtrace\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/sys_common/backtrace.rs:125:18\n 12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/thread/mod.rs:476:17\n 13: <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panic.rs:347:9\n 14: std::panicking::try::do_call\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panicking.rs:401:40\n std::panicking::try\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panicking.rs:365:19\n std::panic::catch_unwind\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/panic.rs:434:14\n std::thread::Builder::spawn_unchecked::{{closure}}\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/thread/mod.rs:475:30\n core::ops::function::FnOnce::call_once{{vtable.shim}}\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/core/src/ops/function.rs:227:5\n 15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/alloc/src/boxed.rs:1572:9\n <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/alloc/src/boxed.rs:1572:9\n std::sys::unix::thread::thread::new::thread_start\n at /rustc/2faabf579323f5252329264cc53ba9ff803429a3/library/std/src/sys/unix/thread.rs:91:17\n 16: start_thread\n 17: __clone\n"] [location=/rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114] [thread_name=timer]

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: zhaokede | Original post link

Did you perform this operation while there were still transactions occurring?

| username: TiDBer_P49NMjIm | Original post link

Everything is normal business, there have been no anomalies.

| username: TiDBer_P49NMjIm | Original post link

Looking at online posts, this seems to be a bug. How should we handle this bug?

| username: Kongdom | Original post link

Based on similar previous issues, it was due to a bug triggered by not restarting for a long time.

| username: TiDBer_P49NMjIm | Original post link

Indeed it is

| username: Kongdom | Original post link

:thinking: Then we can only consider upgrading.

| username: tidb菜鸟一只 | Original post link

The bug in version 5.4.0 causes TiKV to restart once every two years. Upgrade to resolve it.

| username: h5n1 | Original post link

Upgrade to the latest minor version.

| username: TiDBer_7S8XqKfl-1158 | Original post link

The error “Key is locked” in the logs usually indicates that a key is locked by another transaction during execution. This may be due to a transaction not being properly committed or rolled back.