【TiDB Usage Environment】Production
【TiDB Version】
【Encountered Problem】
【Reproduction Path】What operations were performed to cause the problem
【Problem Phenomenon and Impact】
WAL is an important means to ensure transaction durability, equivalent to the redo log in Oracle/MySQL. Write operations of a transaction are first persisted to the WAL, and then dirty data is asynchronously flushed to the disk. If the database crashes unexpectedly, the WAL can be used to recover the data.
Raft log is the log used for data replication in the Raft consensus protocol, and it is a means to ensure consistency in the CAP theorem. When a transaction initiates a commit on the leader replica, the write operation of the transaction is encapsulated into a Raft log entry, which is then replicated to the majority of nodes. After being appended and applied, the commit operation can successfully return.
TiCDC reads the KV change log directly from TiKV, not the Raft log.
Raft log is used for data synchronization between different instances of TiKV. To use an inappropriate analogy, it’s like MySQL’s binlog for master-slave synchronization, while the WAL log is a feature of TiKV’s storage engine RocksDB, similar to MySQL’s redo log.
Thank you very much, the additional answer solved my confusion. However, I would like to ask, is the kv change log a third log independent of the wal log and raft log?
I’m ashamed. I took the PCTP exam last year, but this New Year, my workplace has TiDB, and I realized I’ve forgotten everything. Looks like I need to study it all over again.
TiKV’s underlying structure is divided into two RocksDB instances, one for storing Raft logs and the other for storing data. The Raft log ensures data consistency among the three TiKV instances when data is written to TiKV. This log is recorded by the Raft protocol and eventually lands in RocksDB, where it is stored as a regular key-value pair.
Now, let’s talk about WAL (Write-Ahead Logging). When RocksDB writes data, it first writes to the memtable, which is an in-memory table. If the data were only written to memory and then returned, it wouldn’t be considered persistent. Therefore, a WAL is written, which appends the log directly to a file, effectively performing sequential writes. Sequential writes to disk are relatively fast.
Each TiKV has two RocksDB instances, one for storing raft logs and one for storing actual data, specifically: RocksDB raft and RocksDB db. When writing to the RocksDB db instance, it first writes to the WAL and then to the memtable. My question is, when writing raft logs to the RocksDB raft instance, does it also first write to the WAL and then to the memtable? It should be like this, right?
When RocksDB writes a key/value, it first writes to the WAL and then to the memtable. RocksDB-Raft is simply using a regular RocksDB to store Raft data, so there is no difference.
The key-value pairs written by the user are first written to the WAL (Write Ahead Log) on the disk, which can also be understood as KV change logs. Once a synchronization task is created, the TiCDC cluster will pull these row changed events.
I saw this in an article. Is this correct? Is wal the same as kv change logs?
Is the kv change log a third type of log independent of the wal log and raft log? I’m about to set up ticdc recently, so I want to understand what the kv change log is and what its write mechanism is.