TiKV: Failed to Create KV Engine

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv: failed to create kv engine

| username: wfxxh

【TiDB Usage Environment】Production
【TiDB Version】v6.5.3
【Reproduction Path】What operations were performed when the issue occurred
After modifying the TiKV configuration, the restart failed, and the error is as follows:

| username: tidb菜鸟一只 | Original post link

Is this file missing?

| username: wfxxh | Original post link

There is no such file. The version number of the MANIFEST file in this directory is MANIFEST-1139502, but it looks for MANIFEST-1430889 during startup.

| username: wfxxh | Original post link

The image is not visible. Please provide the text you need translated.

| username: Billmay表妹 | Original post link

Please paste the text version of the error message.

| username: wfxxh | Original post link

`[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:85] ["Welcome to TiKV"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:90] ["Release Version:   6.5.3"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:90] ["Edition:           Community"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:90] ["Git Commit Hash:   fd5f88a7fdda1bf70dcb0d239f60137110c54d46"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:90] ["Git Commit Branch: heads/refs/tags/v6.5.3"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:90] ["UTC Build Time:    Unknown (env var does not exist when building)"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:90] ["Rust Version:      rustc 1.67.0-nightly (96ddd32c4 2022-11-14)"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:90] ["Enable Features:   pprof-fp jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft
-raft-engine cloud-aws cloud-gcp cloud-azure"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [lib.rs:90] ["Profile:           dist_release"]
[2023/08/03 21:25:42.271 +08:00] [INFO] [mod.rs:79] ["cgroup quota: memory=Some(9223372036854771712), cpu=None, cores={9, 20, 36, 2, 26, 32, 24, 33, 14, 3, 2
8, 15, 18, 21, 37, 1, 31, 23, 16, 25, 38, 39, 27, 13, 29, 5, 6, 7, 0, 34, 10, 17, 30, 8, 11, 12, 35, 22, 4, 19}"]
[2023/08/03 21:25:42.273 +08:00] [INFO] [mod.rs:86] ["memory limit in bytes: 415182777344, cpu cores quota: 40"]
[2023/08/03 21:25:42.273 +08:00] [WARN] [lib.rs:543] ["environment variable `TZ` is missing, using `/etc/localtime`"]
[2023/08/03 21:25:42.273 +08:00] [INFO] [config.rs:717] ["kernel parameters"] [value=32768] [param=net.core.somaxconn]
[2023/08/03 21:25:42.273 +08:00] [INFO] [config.rs:717] ["kernel parameters"] [value=0] [param=net.ipv4.tcp_syncookies]
[2023/08/03 21:25:42.273 +08:00] [INFO] [config.rs:717] ["kernel parameters"] [value=0] [param=vm.swappiness]
[2023/08/03 21:25:42.275 +08:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=10.1.3.143:2379]
[2023/08/03 21:25:42.275 +08:00] [INFO] [<unknown>] ["TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter"]
[2023/08/03 21:25:42.277 +08:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=10.1.3.144:2379]
[2023/08/03 21:25:42.279 +08:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=10.1.3.145:2379]
[2023/08/03 21:25:42.280 +08:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://10.1.3.144:2379]
[2023/08/03 21:25:42.281 +08:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://10.1.3.145:2379]
[2023/08/03 21:25:42.282 +08:00] [INFO] [util.rs:763] ["connected to PD member"] [endpoints=http://10.1.3.145:2379]
[2023/08/03 21:25:42.282 +08:00] [INFO] [util.rs:590] ["all PD endpoints are consistent"] [endpoints="[\"10.1.3.143:2379\", \"10.1.3.144:2379\", \"10.1.3.145
:2379\"]"]
[2023/08/03 21:25:42.282 +08:00] [INFO] [server.rs:464] ["connect to PD cluster"] [cluster_id=7252512312611517433]
[2023/08/03 21:25:42.329 +08:00] [INFO] [server.rs:1898] ["beginning system configuration check"]
[2023/08/03 21:25:42.329 +08:00] [INFO] [config.rs:904] ["data dir"] [mount_fs="FsInfo { tp: \"ext4\", opts: \"rw,noatime,nodelalloc,data=ordered\", mnt_dir:
 \"/es0-1\", fsname: \"/dev/nvme0n1\" }"] [data_path=/es0-1/tikv_data]
[2023/08/03 21:25:42.329 +08:00] [INFO] [config.rs:904] ["data dir"] [mount_fs="FsInfo { tp: \"ext4\", opts: \"rw,noatime,nodelalloc,data=ordered\", mnt_dir:
 \"/es0-1\", fsname: \"/dev/nvme0n1\" }"] [data_path=/es0-1/tikv_data/raft]
[2023/08/03 21:25:42.329 +08:00] [INFO] [server.rs:433] ["using config"] [config="{\"log-level\":\"info\",\"log-file\":\"\",\"log-format\":\"text\",\"log-rot
ation-timespan\":\"0s\",\"log-rotation-size\":\"300MiB\",\"slow-log-file\":\"\",\"slow-log-threshold\":\"1s\",\"panic-when-unexpected-key-or-data\":false,\"a
bort-on-panic\":false,\"memory-usage-limit\":\"46528812373B\",\"memory-usage-high-water\":0.9,\"log\":{\"level\":\"info\",\"format\":\"text\",\"enable-timest
amp\":true,\"file\":{\"filename\":\"/home/tidb/deploy/tikv-20160/log/tikv.log\",\"max-size\":300,\"max-days\":0,\"max-backups\":3}},\"quota\":{\"foreground-c
pu-time\":0,\"foreground-write-bandwidth\":\"0KiB\",\"foreground-read-bandwidth\":\"0KiB\",\"max-delay-duration\":\"500ms\",\"background-cpu-time\":0,\"backg
round-write-bandwidth\":\"0KiB\",\"background-read-bandwidth\":\"0KiB\",\"enable-auto-tune\":false},\"readpool\":{\"unified\":{\"min-thread-count\":1,\"max-t
hread-count\":8,\"stack-size\":\"10MiB\",\"max-tasks-per-worker\":2000,\"auto-adjust-pool-size\":false},\"storage\":{\"use-unified-pool\":true,\"high-concurr
ency\":8,\"normal-concurrency\":8,\"low-concurrency\":8,\"max-tasks-per-worker-high\":2000,\"max-tasks-per-worker-normal\":2000,\"max-tasks-per-worker-low\":
2000,\"stack-size\":\"10MiB\"},\"coprocessor\":{\"use-unified-pool\":true,\"high-concurrency\":32,\"normal-concurrency\":32,\"low-concurrency\":32,\"max-task
s-per-worker-high\":2000,\"max-tasks-per-worker-normal\":2000,\"max-tasks-per-worker-low\":2000,\"stack-size\":\"10MiB\"}},\"server\":{\"addr\":\"0.0.0.0:201
60\",\"advertise-addr\":\"10.1.3.2:20160\",\"status-addr\":\"0.0.0.0:20180\",\"advertise-status-addr\":\"10.1.3.2:20180\",\"status-thread-pool-size\":1,\"max
-grpc-send-msg-len\":10485760,\"raft-client-grpc-send-msg-buffer\":524288,\"raft-client-queue-size\":8192,\"raft-msg-max-batch-size\":128,\"grpc-compression-
type\":\"none\",\"grpc-gzip-compression-level\":2,\"grpc-min-message-size-to-compress\":4096,\"grpc-concurrency\":5,\"grpc-concurrent-stream\":1024,\"grpc-ra
ft-conn-num\":1,\"grpc-memory-pool-quota\":\"9223372036854775807B\",\"grpc-stream-initial-window-size\":\"2MiB\",\"grpc-keepalive-time\":\"10s\",\"grpc-keepa
live-timeout\":\"3s\",\"concurrent-send-snap-limit\":32,\"concurrent-recv-snap-limit\":32,\"end-point-recursion-limit\":1000,\"end-point-stream-channel-size\
":8,\"end-point-batch-row-limit\":64,\"end-point-stream-batch-row-limit\":128,\"end-point-enable-batch-if-possible\":true,\"end-point-request-max-handle-dura
tion\":\"1m\",\"end-point-max-concurrency\":40,\"end-point-perf-level\":0,\"snap-max-write-bytes-per-sec\":\"100MiB\",\"snap-max-total-size\":\"0KiB\",\"stat
s-concurrency\":1,\"heavy-load-threshold\":75,\"heavy-load-wait-duration\":null,\"enable-request-batch\":true,\"background-thread-count\":3,\"end-point-slow-
log-threshold\":\"1s\",\"forward-max-connections-per-address\":4,\"reject-messages-on-memory-ratio\":0.2,\"simplify-metrics\":false,\"labels\":{}},\"storage\
":{\"data-dir\":\"/es0-1/tikv_data\",\"gc-ratio-threshold\":1.1,\"max-key-size\":8192,\"scheduler-concurrency\":524288,\"scheduler-worker-pool-size\":8,\"sch
eduler-pending-write-threshold\":\"100MiB\",\"reserve-space\":\"5GiB\",\"reserve-raft-space\":\"1GiB\",\"enable-async-apply-prewrite\":false,\"api-version\":
1,\"enable-ttl\":false,\"background-error-recovery-window\":\"1h\",\"ttl-check-poll-interval\":\"12h\",\"flow-control\":{\"enable\":true,\"soft-pending-compa
ction-bytes-limit\":\"192GiB\",\"hard-pending-compaction-bytes-limit\":\"1TiB\",\"memtables-threshold\":5,\"l0-files-threshold\":20},\"block-cache\":{\"share
d\":true,\"capacity\":\"26GiB\",\"num-shard-bits\":6,\"strict-capacity-limit\":false,\"high-pri-pool-ratio\":0.8,\"memory-allocator\":\"nodump\"},\"io-rate-l
imit\":{\"max-bytes-per-sec\":\"0KiB\",\"mode\":\"write-only\",\"strict\":false,\"foreground-read-priority\":\"high\",\"foreground-write-priority\":\"high\",
\"flush-priority\":\"high\",\"level-zero-compaction-priority\":\"medium\",\"compaction-priority\":\"low\",\"replication-priority\":\"high\",\"load-balance-pr
iority\":\"high\",\"gc-priority\":\"high\",\"import-priority\":\"medium\",\"export-priority\":\"medium\",\"other-priority\":\"high\"}},\"pd\":{\"endpoints\":
[\"10.1.3.143:2379\",\"10.1.3.144:2379\",\"10.1.3.145:2379\"],\"retry-interval\":\"300ms\",\"retry-max-count\":9223372036854775807,\"retry-log-every\":10,\"u
pdate-interval\":\"10m\",\"enable-forwarding\":false},\"metric\":{\"job\":\"tikv\"},\"raftstore\":{\"prevote\":true,\"raftdb-path\":\"/es0-1/tikv_data/raft\"
,\"capacity\":\"0KiB\",\"raft-base-tick-interval\":\"1s\",\"raft-heartbeat-ticks\":2,\"raft-election-timeout-ticks\":10,\"raft-min-election-timeout-ticks\":1
0,\"raft-max-election-timeout-ticks\":20,\"raft-max-size-per-msg\":\"1MiB\",\"raft-max-inflight-msgs\":256,\"raft-entry-max-size\":\"110MiB\",\"raft-log-comp
act-sync-interval\":\"2s\",\"raft-log-gc-tick-interval\":\"3s\",\"raft-log-gc-threshold\":50,\"raft-log-gc-count-limit\":73728,\"raft-log-gc-size-limit\":\"7
2MiB\",\"raft-log-reserve-max-ticks\":6,\"raft-engine-purge-interval\":\"10s\",\"raft-entry-cache-life-time\":\"30s\",\"split-region-check-tick-interval\":\"
10s\",\"region-split-check-diff\":\"6MiB\",\"region-compact-check-interval\":\"5m\",\"region-compact-check-step\":100,\"region-compact-min-tombstones\":10000
,\"region-compact-tombstones-percent\":30,\"pd-heartbeat-tick-interval\":\"1m\",\"pd-store-heartbeat-tick-interval\":\"10s\",\"snap-mgr-gc-tick-interval\":\"
1m\",\"snap-gc-timeout\":\"4h\",\"lock-cf-compact-interval\":\"10m\",\"lock-cf-compact-bytes-threshold\":\"256MiB\",\"notify-capacity\":40960,\"messages-per-
tick\":4096,\"max-peer-down-duration\":\"10m\",\"max-leader-missing-duration\":\"2h\",\"abnormal-leader-missing-duration\":\"10m\",\"peer-stale-state-check-i
nterval\":\"5m\",\"leader-transfer-max-log-lag\":128,\"snap-apply-batch-size\":\"10MiB\",\"snap-apply-copy-symlink\":false,\"region-worker-tick-interval\":\"
1s\",\"clean-stale-ranges-tick\":10,\"consistency-check-interval\":\"0s\",\"report-region-flow-interval\":\"1m\",\"raft-store-max-leader-lease\":\"9s\",\"che
ck-leader-lease-interval\":\"2s250ms\",\"renew-leader-lease-advance-duration\":\"2s250ms\",\"right-derive-when-split\":true,\"merge-max-log-gap\":10,\"merge-
check-tick-interval\":\"2s\",\"use-delete-range\":false,\"snap-generator-pool-size\":2,\"cleanup-import-sst-interval\":\"10m\",\"local-read-batch-size\":1024
,\"apply-max-batch-size\":256,\"apply-pool-size\":2,\"apply-reschedule-duration\":\"5s\",\"apply-low-priority-pool-size\":1,\"store-max-batch-size\":256,\"st
ore-pool-size\":2,\"store-reschedule-duration\":\"5s\",\"store-low-priority-pool-size\":0,\"store-io-pool-size\":0,\"store-io-notify-capacity\":40960,\"futur
e-poll-size\":1,\"hibernate-regions\":true,\"dev-assert\":false,\"apply-yield-duration\":\"500ms\",\"apply-yield-write-size\":\"32KiB\",\"perf-level\":0,\"ev
ict-cache-on-memory-ratio\":0.0,\"cmd-batch\":true,\"cmd-batch-concurrent-ready-max-count\":1,\"raft-write-size-limit\":\"1MiB\",\"waterfall-metrics\":true,\
"io-reschedule-concurrent-max-count\":4,\"io-reschedule-hotpot-duration\":\"5s\",\"inspect-interval\":\"500ms\",\"report-min-resolved-ts-interval\":\"1s\",\"
reactive-memory-lock-tick-interval\":\"2s\",\"reactive-memory-lock-timeout-tick\":5,\"report-region-buckets-tick-interval\":\"10s\",\"check-long-uncommitted-
interval\":\"10s\",\"long-uncommitted-base-threshold\":\"20s\",\"max-entry-cache-warmup-duration\":\"1s\",\"max-snapshot-file-raw-size\":\"100MiB\",\"unreach
able-backoff\":\"10s\"},\"coprocessor\":{\"split-region-on-table\":false,\"batch-split-limit\":10,\"region-max-size\":\"144MiB\",\"region-split-size\":\"96Mi
B\",\"region-max-keys\":1440000,\"region-split-keys\":960000,\"consistency-check-method\":\"mvcc\",\"enable-region-bucket\":false,\"region-bucket-size\":\"96
MiB\",\"region-size-threshold-for-approximate\":\"1440MiB\",\"prefer-approximate-bucket\":true,\"region-bucket-merge-size-ratio\":0.33},\"coprocessor-v2\":{\
"coprocessor-plugin-directory\":null},\"rocksdb\":{\"info-log-level\":\"info\",\"wal-recovery-mode\":2,\"wal-dir\":\"\",\"wal-ttl-seconds\":0,\"wal-size-limi
t\":\"0KiB\",\"max-total-wal-size\":\"4GiB\",\"max-background-jobs\":9,\"max-background-flushes\":3,\"max-manifest-file-size\":\"128MiB\",\"create-if-missing
\":true,\"max-open-files\":40960,\"enable-statistics\":true,\"stats-dump-period\":\"10m\",\"compaction-readahead-size\":\"0KiB\",\"info-log-max-size\":\"1GiB
\",\"info-log-roll-time\":\"0s\",\"info-log-keep-log-file-num\":3,\"info-log-dir\":\"\",\"rate-bytes-per-sec\":\"10GiB\",\"rate-limiter-refill-period\":\"100
ms\",\"rate-limiter-mode\":2,\"rate-limiter-auto-tuned\":true,\"bytes-per-sync\":\"1MiB\",\"wal-bytes-per-sync\":\"512KiB\",\"max-sub-compactions\":3,\"writa
ble-file-max-buffer-size\":\"1MiB\",\"use-direct-io-for-flush-and-compaction\":false,\"enable-pipelined-write\":false,\"enable-unordered-write\":false,\"defa
ultcf\":{\"block-size\":\"64KiB\",\"block-cache-size
| username: dba-kit | Original post link

What configurations did you modify? Did you use the tiup edit-config command to make the changes?

| username: wfxxh | Original post link

Just these three, modified by tiup edit-config.

I think it shouldn’t be related to what was modified, other nodes can start, only this node can’t start.

| username: WalterWj | Original post link

This path feels strange, is it a local disk? If it really can’t start, just scale out a node and force this node to go offline.

| username: wfxxh | Original post link

It’s a local disk, and it really can’t start. I’m planning to scale in this TiKV service and then scale it out again.

| username: wfxxh | Original post link

This is very strange, another node also reported this error.

| username: redgame | Original post link

Is the communication between TiKV nodes normal?

| username: wfxxh | Original post link

Normal.

| username: kavenab | Original post link

The error “tikv: failed to create kv engine” means that there was an issue when creating the TiKV engine.

This error can have multiple causes, including but not limited to the following:

  1. Configuration Issues: Check if the TiKV configuration file is correct and ensure all configuration items are properly set.

  2. Hardware Issues: It could be a hardware failure that prevents the creation of the TiKV engine. Check the server’s hardware status, such as disk space, memory, etc., to ensure they are functioning normally.

  3. Network Issues: Unstable or unavailable network connections might cause the TiKV engine creation to fail. Ensure the network connection is stable and that all nodes in the TiDB cluster can communicate with each other.

  4. Version Mismatch: Compatibility between TiDB, TiKV, and PD (Placement Driver) versions is crucial. Ensure that these components’ versions are compatible with each other and with the version of TiDB you are using.

| username: wfxxh | Original post link

Points 2, 3, and 4 can be ruled out. Based on the configuration in my screenshot, I estimate that point 1 can also be ruled out.

| username: wfxxh | Original post link

I think the issue lies in why it would look for a file with a version greater than the local version.

| username: jansu-dev | Original post link

Has it been scaled? Are issues still appearing one after another? I see only one disconnected TiKV.
Is this cluster running on virtual machines? We’ve encountered situations before where a virtual machine lost power, causing the upper-layer TiKV to fail to start.

A while ago, we encountered a case similar to yours. The customer also said they hadn’t touched anything, but it seemed to be a case of data loss recovery.
If the data is fine, just scale to resolve it. Tracing the root cause can be very difficult.

| username: wfxxh | Original post link

It’s not a virtual machine, just modified the three configurations in the screenshot, and as a result, each TiKV node reported this error in turn when reloading. I had to scale-in first and then scale-out again.