What exactly does TiDB do during a delete operation?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: delete操作的时候,tidb具体是操作了什么

| username: 大鱼海棠

[TiDB Usage Environment] Research
[TiDB Version] v5.4.1
[Encountered Problem] What is the underlying process when TiDB handles delete statements? Can you explain the specific processing of RocksDB Raft and RocksDB KV?
[Reproduction Path] What operations were performed that led to the problem
[Problem Phenomenon and Impact]

[Attachment]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: ddhe9527 | Original post link

The delete operation can be considered as appending new KV data to RocksDB, where the Key is the same as the KV data to be deleted, and the Value is KTypeDeletion.

The Raft layer process is roughly as follows:

  1. The Leader converts the delete operation into a Raft Log, which contains the Region ID, Raft Log ID, and the specific log content.
  2. Append: The Leader persists the Raft Log to the local RocksDB (raftdb).
  3. Replicate: The Leader replicates the Raft Log to Followers for synchronization. After receiving the Raft Log, Followers persist it to the local RocksDB (raftdb) and feedback to the Leader that the synchronization is successful.
  4. Committed: When the Leader receives the message that the majority of nodes (including the Leader itself) have successfully appended, the Leader considers this Raft Log to be successfully committed.
  5. Apply: The Raft log is taken out from raftdb and applied to kvdb.

After RocksDB receives this KV data, it first writes it to the memtable, then flushes it to the SST file on disk through the Immutable memtable. The delete operation is truly executed during a compaction process in RocksDB.

| username: 大鱼海棠 | Original post link

So before compaction, if there are queries on this table, TiKV will also read through these deleted data when reading the data, is that correct?

| username: ddhe9527 | Original post link

Read operations will first access the Block Cache, Memtable, and Immutable Memtable in memory, and then read the SST files on the disk, searching from Level 0 downwards. Following this path, before compaction, the read operation will first encounter the KV data written by the delete operation, thus knowing that this row of data has been deleted and does not need to be returned.

| username: 大鱼海棠 | Original post link

Thank you, it was explained very clearly.

| username: TiDBer_N7Dgd7O4 | Original post link

May I ask, for the delete operation, is the action of determining whether the data exists performed before appending the Raft Log, or is it executed during the Apply phase?

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.