How is the DELETE statement in TiDB stored in TiKV?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 删除语句在tikv里面怎么存储的

| username: 大飞飞呀

【TiDB Usage Environment】Production Environment
【TiDB Version】
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Issue Phenomenon and Impact】
When deleting a piece of data in TiDB, how is it stored in TiKV?
Key_delete → Value?


Does anyone know the principle? Please share.
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】

| username: Billmay表妹 | Original post link

TiDB uses TiKV to store data at the underlying level, and TiKV employs the LSM tree data structure, which is an append only model. This means that all changes to the data are reflected in appends. When a piece of data is deleted, the key of that data is marked as deleted rather than being directly removed from the disk. This key marked for deletion will be cleaned up during subsequent GC processes. Therefore, the actual operation of deleting a piece of data is to insert the key of that data into TiKV and set its value to empty. This operation generates a Key_delete with an empty corresponding value.

| username: 大飞飞呀 | Original post link

Is the key row_id?

| username: Billmay表妹 | Original post link

In TiDB, the data key is not the row_id, but is composed of the table’s primary key and indexes. Data in TiDB is stored in rows, and each row of data has a unique primary key. The value of the primary key can be of any type, including integers, strings, etc. In TiDB, both primary keys and indexes are stored in the form of B+ trees. Each node contains multiple key-value pairs, where the key is the value of the primary key or index, and the value is a pointer to the data row. When TiDB executes a query operation, it searches the B+ tree based on the query conditions to find the corresponding key-value pair, and then uses the pointer to locate the corresponding data row. Therefore, the data key in TiDB is not the row_id, but is composed of the primary key and indexes.

| username: Billmay表妹 | Original post link

If you are interested in the source code, you can check out:

This is a continuous learning process. If you are interested, make sure to keep learning and keep watching.

| username: 大飞飞呀 | Original post link

I understand that when deleting,
the key is
tablePrefix{TableID}_recordPrefixSep{RowID}delete{time}

| username: xfworld | Original post link

This can be understood in many ways, :upside_down_face:
but it must conform to the semantics of the RocksDB compaction filter or the compaction process.

| username: zhanggame1 | Original post link

TiDB uses a transaction model based on Percolator, abstracting a row of data into three CF (column families) for storage: default, write, and lock. Among them:

  • default CF stores the actual data
    ​​${key}_${start_ts} --> ${value}​​
  • write CF stores the version information of the data, where commit_ts represents the actual version of a record
    ​​${key}_${commit_ts} --> ${start_ts}​​
  • lock CF stores lock information. Transactions in the process of being committed will add a lock, which includes the location of the primary lock
    ​​${key} --> ${start_ts, primary_key, ..etc}​​
| username: Anna | Original post link

Check this out: 三篇文章了解 TiDB 技术内幕 - 说存储 | PingCAP

| username: h5n1 | Original post link

The image is not available for translation. Please provide the text content directly for translation.

| username: 南征北战 | Original post link

At the region level, TiKV will append a record that has been deleted for a day. At the LSM level, during splitting or merging, the deleted data will be physically removed.

| username: redgame | Original post link

Mark the key of the data as deleted.

| username: xiaohaozifeifeifei | Original post link

Add more versions, then perform garbage collection (GC).