The K-V Structure of TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb的k-v结构

| username: liul

The way K-V is generated in TiDB

In TiDB, data is ultimately stored in TiKV (with RocksDB at the bottom layer). Therefore, we need to understand the implementation of KV during storage.

There are several main types of K-V:

Creating a table

The key structure of the table, refer to EncodeTablePrefix

Key length is 9

The unique table ID generated earlier is processed by adding a ‘t’ in front. The unique ID generation function: assignTableID

Composition: ttableid

Code:


K-V formed by inserting data into the table

Key: tablePrefix{tableID}_recordPrefixSep{rowID}

Byte 1 2-9 10-11 12-19
value t tableid _r rowid

Value: [col1, col2, col3, col4]

Key is divided into two parts: tableid and rowid repair value

Code:
image
image


Debug verification as shown in the figure

Index K-V

There are two types of indexes:

Unique index

Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue

Value: rowID

Non-unique index

Based on specific analysis, the composition of the key can be derived as follows

Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue_rowID

byte 1 2-9 10-11 12-19 20 21-28 29 30-37
value t tableid _i indexID ValueType indexvalue valueFlag rowid

Value: null

Code debug verification


Explanation:

116: ‘t’ tablePrefix

128,0,0,0,0,0,0,107: table id, this table id is 107, 128 is an unsigned conversion

Reference: EncodeIntToCmpUint function uint64(v) ^ 0x8000000000000000

95,105: ‘_i’ indexPrefixSep

128,0,0,0,0,0,0,1: indexID

3: value type

128,0,0,0,0,0,0,2: value

3: rowid type intFlag

128,0,0,0,0,1,95,146: rowid

Partition index

Key: tablePrefix{partitionID}_indexPrefixSep{indexID}_indexedColumnsValue

Value: rowID

Regarding data storage: Data is stored according to the datum type during storage.

Let’s illustrate with an example:

create table t4(c1 varchar(20), c2 int, c3 double, c4 float);

insert into t4 values(‘hello’, 1, 1.2345, 6.7890);

When we insert data, the corresponding four columns of data are converted from their original types to storage types. The results before and after conversion are shown in the figure:




The specific representation of the converted data types is shown in the figure below:

This is my first time posting such a technical content, I hope everyone can give me more guidance.

| username: 裤衩儿飞上天 | Original post link

:+1: :+1: :+1:

| username: cassblanca | Original post link

I’ve learned a lot.

| username: Kongdom | Original post link

:call_me_hand: :call_me_hand: :call_me_hand:

| username: zhanggame1 | Original post link

Learned.

| username: Jellybean | Original post link

Source code analysis, this is a very good technical sharing, thumbs up.