KV Issues in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb中关于k-v问题

| username: liul

Can anyone explain how the k-v generation works in TiKV?
How is the table information converted into a k-v structure when creating a table?
When inserting data, how is the key generated and how is the value filled? Does anyone have any insights on this?

| username: 裤衩儿飞上天 | Original post link

You can check out the TiDB server chapter in the 302 course, which provides a detailed introduction.

Advanced TiDB System Management [TiDB v5] (302) (pingcap.com)

| username: redgame | Original post link

The moderator is right.

| username: 有猫万事足 | Original post link

You can start diving into this series directly.

The content you need is explained in detail in this article:

| username: tidb菜鸟一只 | Original post link

There are two scenarios, and you can understand it like this:

For a clustered table, the key (RowID) is tablePrefix{TableID}_recordPrefixSep{Col1}, which consists of the table ID and the primary key. The value is a JSON string composed of all other fields except the primary key.

For a non-clustered table, the key is tablePrefix{TableID}_recordPrefixSep{_Tidb_RowID}, which is a concatenation of the table ID and _Tidb_RowID (automatically generated only for non-clustered tables). The value is a JSON string composed of all fields.

When you query a field’s value using the primary key of a clustered table, it forms a point query. The table ID and primary key ID you provide can already determine the key in TiKV. At this point, you can directly retrieve the corresponding field from the value in TiKV.

However, when you query a field’s value using the primary key of a non-clustered table, you cannot determine the key directly. You need to first obtain the key stored in TiKV through the primary key index, and then use this key to retrieve the corresponding field from the value in TiKV.

| username: zhanggame1 | Original post link

The table information for creating a table is converted into a key-value structure. Simply put, by default:

  • The key is the unique ID of the table plus the primary key field.
  • The value is the data row storage.

If the primary key is a composite index or a non-clustered table, the database generates an internal hidden _tidb_rowid of bigint type as the unique primary key, which is then combined with the table’s ID to form the key.

| username: liul | Original post link

I saw in the pd code
pd/cddec.go

tablePrefix = []byte{'t'}

recordPrefix = []byte("_r")

// GenerateTableKey generates a table split key.

func GenerateTableKey(tableID int64) []byte {

    buf := make([]byte, 0, len(tablePrefix)+8)

    buf = append(buf, tablePrefix...)

    buf = EncodeInt(buf, tableID)

    return buf

}

// GenerateRowKey generates a row key.

func GenerateRowKey(tableID, rowID int64) []byte {

    buf := make([]byte, 0, len(tablePrefix)+len(recordPrefix)+8*2)

    buf = append(buf, tablePrefix...)

    buf = EncodeInt(buf, tableID)

    buf = append(buf, recordPrefix...)

    buf = EncodeInt(buf, rowID)

    return buf

}

These two should be the generation of table key and row key.

| username: liul | Original post link

Haha, this key generation should be done in pd_server. Currently, I’m still looking into how to trace and debug the Go code.

| username: liul | Original post link

Reading it, thank you very much.

| username: 裤衩儿飞上天 | Original post link

:+1: :+1: :+1:

| username: liul | Original post link

Based on the documentation left in the code, the data composition is as follows:
The data in TiDB is stored in TiKV in the following format:
Key: tablePrefix{tableID}_recordPrefixSep{rowID}
Value: [col1, col2, col3, col4]

Unique Index
Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue // Some documents mention this, but is it still in use?
Key: tablePrefix{tableID}_indexPrefixSep{indexID}_sortKey // Not sure which one is correct
Value: rowID

Non-Unique Index
Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue_rowID
Key: tablePrefix{tableID}_indexPrefixSep{indexID}_sortKeys_rowID
Value: null

Partitioned Table Index
Key: tablePrefix{partitionID}_indexPrefixSep{indexID}_indexedColumnsValue
Value: rowID

Regarding the index, I’m not sure which of the two keys is correct.
The one with indexedColumnsValue seems to be older; is it no longer in use and replaced with a new one?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.