Understanding SST Files and Regions in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB的sst文件与region 如何理解

| username: lemonade010

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] How to understand the SST files and regions in TiDB, and what is the relationship between them
[Encountered Issues: Problem Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]

| username: 这里介绍不了我 | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: zhanggame1 | Original post link

SST files are the storage data files of RocksDB, while regions are a logical concept in TiDB and have nothing to do with physical storage.

| username: 胡杨树旁 | Original post link

The data in the region should all be found in the SST files, right? I understand that SST files are the physical storage of data for many regions, correct?

| username: xingzhenxiang | Original post link

A region is a logical storage unit.
Isn’t SST a physical storage file?

| username: tidb菜鸟一只 | Original post link

Both are key:value pairs, but SST is physical storage and is generally sequential, whereas a region is logical. For example, SST1 might have 100 key-value pairs, with 10 possibly belonging to region1 and 10 to region2. Similarly, SST2 might also have 100 key-value pairs, with some belonging to region1 and region2.

| username: TiDBer_jYQINSnf | Original post link

The entire TiKV is a range of keys, storing keys from “” to “”, which can be understood as being able to store all keys. Then logically, these keys are divided into groups. For example, from “” to AAAA is one group, and from AAAA to “” is another group. This is what a region is.

As for SST, it is a file stored on disk by RocksDB.

| username: 小龙虾爱大龙虾 | Original post link

Is that so? :flushed:

| username: 随缘天空 | Original post link

SST files are files used to store the persistent data of MemTable in RocksDB, while Regions are logical partitioning units for data storage in TiKV. SST files are related to data persistence, and the management of Regions directly affects the load balancing and scalability of the entire distributed database. Specifically:

  1. SST Files (Sorted String Table):

    • In RocksDB, SST files are formed by flushing the MemTable in memory to disk when certain conditions are met.
    • SST files are ordered, with data sorted by Key internally, which improves read efficiency.
    • The size of SST files can be configured, and when the file size exceeds the set threshold, a Compaction operation is triggered to merge multiple SST files, reducing the number of files and optimizing read/write performance.
  2. Region:

    • In TiKV, data is logically divided into multiple Regions, with each Region responsible for storing a portion of the Key Range.
    • Region is the smallest unit scheduled by PD (Placement Driver). PD is responsible for the allocation and scheduling of Regions to ensure balanced data distribution.
    • The size of a Region can be configured. When the amount of data in a Region is too large or too small, PD will automatically perform split or merge operations based on the configuration to maintain efficient system operation.
| username: zhang_2023 | Original post link

Building and room

| username: 江湖故人 | Original post link

They are all collections of key-value pairs.
SST is a physical concept, while Region is a logical concept. In TiKV, it is a one-to-many relationship.

| username: 数据库真NB | Original post link

  1. SST files are the actual files that store data; 2. A Region is a logical unit concept for managing multiple related data files.
    The two terms are on different dimensions and have no direct relationship.
| username: 江湖故人 | Original post link

Teachers, does each region have a corresponding SST during backup?

| username: 哈喽沃德 | Original post link

SST files are the data file format stored in TiKV, while Regions are the units for data partitioning and management. Together, they form the foundation of TiDB’s distributed storage and query system.

| username: zhanggame1 | Original post link

Yes, they correspond to each other. Each SST file name includes a regionID.

| username: lemonade010 | Original post link

@all Thank you all for your replies.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.