Understanding SST Files and Regions in TiDB

translator_bot · June 21, 2024, 4:26am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB的sst文件与region 如何理解

| username: lemonade010

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] How to understand the SST files and regions in TiDB, and what is the relationship between them
[Encountered Issues: Problem Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]

translator_bot · June 21, 2024, 4:26am

| username: 这里介绍不了我 | Original post link

The image you provided is not visible. Please provide the text you need translated.

translator_bot · June 21, 2024, 4:26am

| username: zhanggame1 | Original post link

SST files are the storage data files of RocksDB, while regions are a logical concept in TiDB and have nothing to do with physical storage.

translator_bot · June 21, 2024, 4:26am

| username: 胡杨树旁 | Original post link

The data in the region should all be found in the SST files, right? I understand that SST files are the physical storage of data for many regions, correct?

translator_bot · June 21, 2024, 4:26am

| username: xingzhenxiang | Original post link

A region is a logical storage unit.
Isn’t SST a physical storage file?

translator_bot · June 21, 2024, 4:26am

| username: tidb菜鸟一只 | Original post link

Both are key:value pairs, but SST is physical storage and is generally sequential, whereas a region is logical. For example, SST1 might have 100 key-value pairs, with 10 possibly belonging to region1 and 10 to region2. Similarly, SST2 might also have 100 key-value pairs, with some belonging to region1 and region2.

translator_bot · June 21, 2024, 4:26am

| username: TiDBer_jYQINSnf | Original post link

The entire TiKV is a range of keys, storing keys from “” to “”, which can be understood as being able to store all keys. Then logically, these keys are divided into groups. For example, from “” to AAAA is one group, and from AAAA to “” is another group. This is what a region is.

As for SST, it is a file stored on disk by RocksDB.

translator_bot · June 21, 2024, 4:26am

| username: 小龙虾爱大龙虾 | Original post link

Is that so?

translator_bot · June 21, 2024, 4:26am

| username: 随缘天空 | Original post link

SST files are files used to store the persistent data of MemTable in RocksDB, while Regions are logical partitioning units for data storage in TiKV. SST files are related to data persistence, and the management of Regions directly affects the load balancing and scalability of the entire distributed database. Specifically:

SST Files (Sorted String Table):
- In RocksDB, SST files are formed by flushing the MemTable in memory to disk when certain conditions are met.
- SST files are ordered, with data sorted by Key internally, which improves read efficiency.
- The size of SST files can be configured, and when the file size exceeds the set threshold, a Compaction operation is triggered to merge multiple SST files, reducing the number of files and optimizing read/write performance.
Region:
- In TiKV, data is logically divided into multiple Regions, with each Region responsible for storing a portion of the Key Range.
- Region is the smallest unit scheduled by PD (Placement Driver). PD is responsible for the allocation and scheduling of Regions to ensure balanced data distribution.
- The size of a Region can be configured. When the amount of data in a Region is too large or too small, PD will automatically perform split or merge operations based on the configuration to maintain efficient system operation.

translator_bot · June 21, 2024, 4:26am

| username: zhang_2023 | Original post link

Building and room

translator_bot · June 21, 2024, 4:26am

| username: 江湖故人 | Original post link

They are all collections of key-value pairs.
SST is a physical concept, while Region is a logical concept. In TiKV, it is a one-to-many relationship.

translator_bot · June 21, 2024, 4:26am

| username: 数据库真NB | Original post link

SST files are the actual files that store data; 2. A Region is a logical unit concept for managing multiple related data files.
The two terms are on different dimensions and have no direct relationship.

translator_bot · June 21, 2024, 4:26am

| username: 江湖故人 | Original post link

Teachers, does each region have a corresponding SST during backup?

translator_bot · June 21, 2024, 4:26am

| username: 哈喽沃德 | Original post link

SST files are the data file format stored in TiKV, while Regions are the units for data partitioning and management. Together, they form the foundation of TiDB’s distributed storage and query system.

translator_bot · June 21, 2024, 4:26am

| username: zhanggame1 | Original post link

Yes, they correspond to each other. Each SST file name includes a regionID.

translator_bot · June 21, 2024, 4:26am

| username: lemonade010 | Original post link

@all Thank you all for your replies.

translator_bot · June 21, 2024, 4:26am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.