The relationship between TiKV's SST files, RocksDB Raft, and RocksDB KV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv的sst文件与rocksdb raft与rocksdb kv关系

| username: 路在何chu

Course Name: Course Version (101) + Course Name

After watching the 101 course, specifically the TiKV chapter, there are some parts I didn’t understand. I’ve only recently started working with TiDB, so I hope the experts here can guide me. Below are two images for reference:


Here’s my understanding:
WAL is the main leader writing to RocksDB Raft first, similar to MySQL’s redo. SST files are like dirty data being flushed to disk. However, the course mentions that the main leader’s RocksDB Raft also needs to be applied to RocksDB KV, which I don’t quite understand. The main leader’s RocksDB Raft is transmitted to the follower nodes and applied as SST files. Why does it also need to be applied as RocksDB KV on its own node? Is RocksDB KV the same as SST files? I would appreciate any guidance from the experts.

| username: 随缘天空 | Original post link

First of all, you need to know that TiKV does not write data to disk; the data persistence operation is achieved through RocksDB. RocksDB is used to store data and metadata. Raft is the distributed consistency protocol implemented by TiDB to ensure data consistency and reliability in the TiKV cluster. The RocksDB KV storage engine is built on RocksDB and is used to store TiKV’s key-value data. SST files are one of the ways RocksDB stores data. Persisted data is in these files, and data migration and backup can be performed using SST files.

| username: Jolyne | Original post link

RaftDB is used for log persistence, and then these raft logs are applied to RocksDB for persistence.

| username: 路在何chu | Original post link

The SST file is persistent, and the memtable is also written to the SST file. Why is there a need for two instances of persistence?

| username: 路在何chu | Original post link

I know about RocksDB. I just want to ask if the SST file is flushed from the memtable or if it’s the result of RocksDB raft applying to RocksDB KV. If it’s just flushed from the memtable, then RocksDB KV is also persistent data, which means the data exists in two copies, unless RocksDB KV is the SST file.

| username: Fly-bird | Original post link

It should be uploaded to PD and then inform PD that the data file has been written to disk.

| username: 路在何chu | Original post link

I don’t quite understand.

| username: 随缘天空 | Original post link

The memtable should store key-value pairs, while the actual data should be stored in SST files.


For more detailed information, you can refer to the official documentation: RocksDB 简介 | PingCAP 文档中心

| username: 路在何chu | Original post link

Well, the process of applying rocksdb raft logs to rocksdb kv involves applying the logs to generate a memtable, which is essentially dirty pages, and then flushing them to disk to form sst files. This course didn’t explain it very clearly.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.