TiKV Fails to Start Due to Raft Log Loss

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiKV 因raftlog丢失启动失败

| username: Timber

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.0
[Reproduction Path] Power outage on site, TiKV server failed to start, error reported that raftlog might be lost.
[Problem Encountered: Problem Phenomenon and Impact] TiKV failed to start. Is it possible to forcibly skip the erroneous raftlog and restart TiKV? Tried adjusting recover-mode to tolerate-any-corruption, but still reported raft data loss error upon startup. Previously tested, once raftlog loss exceeds a certain amount, this configuration becomes ineffective. But theoretically, the loss of raftlog on one node should not affect the overall data integrity of the cluster. Rebuilding the TiKV node because of this seems a bit overkill.
[Attachment: Screenshot/Log/Monitoring]

| username: MrSylar | Original post link

I experienced the same issue. After some struggle, I decided to rebuild the cluster, but the solution is indeed not ideal.

| username: xfworld | Original post link

It seems that manual mode is not supported at the moment. Even if you don’t rebuild this node, some lossy recovery is still needed, which is quite troublesome. Rebuilding might be simpler… :rofl:

| username: Timber | Original post link

Sigh, it’s really a bit of a headache.

| username: Timber | Original post link

So, may I ask, if we adjust the TiKV configuration to bytes-per-sync=0, allowing the Raft logs to be written directly to disk, will it be easier to recover in the event of a power outage?

| username: xfworld | Original post link

You can refer to the following:

bytes-per-sync is a parameter in TiKV’s RocksDB that controls the frequency of fsync operations when RocksDB writes data. Specifically, when the amount of data written by RocksDB reaches bytes-per-sync, an fsync operation is performed to flush the data to disk. The default value of this parameter is 0, which means there is no limit on the amount of data written, and an fsync operation is performed with each write. Setting it to a larger value can reduce the number of fsync operations and improve write performance, but it may increase the risk of data loss. Setting it to 0 ensures no data loss, but write performance may be affected.

wal-bytes-per-sync and bytes-per-sync are both parameters in TiKV’s RocksDB that control the frequency of fsync operations when RocksDB writes data. They have similar functions but apply to different objects.

Specifically, bytes-per-sync controls the frequency of fsync operations when writing SST files, while wal-bytes-per-sync controls the frequency of fsync operations when writing WAL files. WAL files are Write-Ahead Logs in RocksDB used to record data changes, ensuring data consistency and durability. Therefore, fsync operations need to be performed more frequently for WAL file writes to ensure no data loss.

The default values for these two parameters are "1MB" and "512KB" respectively, and they can be adjusted based on actual conditions. Setting both parameters to 0 means an fsync operation is performed with each write, ensuring data consistency and durability but affecting write performance.