Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TIDB的sst文件与region 如何理解
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] How to understand the SST files and regions in TiDB, and what is the relationship between them
[Encountered Issues: Problem Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]
The image you provided is not visible. Please provide the text you need translated.
SST files are the storage data files of RocksDB, while regions are a logical concept in TiDB and have nothing to do with physical storage.
The data in the region should all be found in the SST files, right? I understand that SST files are the physical storage of data for many regions, correct?
A region is a logical storage unit.
Isn’t SST a physical storage file?
Both are key:value pairs, but SST is physical storage and is generally sequential, whereas a region is logical. For example, SST1 might have 100 key-value pairs, with 10 possibly belonging to region1 and 10 to region2. Similarly, SST2 might also have 100 key-value pairs, with some belonging to region1 and region2.
The entire TiKV is a range of keys, storing keys from “” to “”, which can be understood as being able to store all keys. Then logically, these keys are divided into groups. For example, from “” to AAAA is one group, and from AAAA to “” is another group. This is what a region is.
As for SST, it is a file stored on disk by RocksDB.
SST files are files used to store the persistent data of MemTable in RocksDB, while Regions are logical partitioning units for data storage in TiKV. SST files are related to data persistence, and the management of Regions directly affects the load balancing and scalability of the entire distributed database. Specifically:
-
SST Files (Sorted String Table):
- In RocksDB, SST files are formed by flushing the MemTable in memory to disk when certain conditions are met.
- SST files are ordered, with data sorted by Key internally, which improves read efficiency.
- The size of SST files can be configured, and when the file size exceeds the set threshold, a Compaction operation is triggered to merge multiple SST files, reducing the number of files and optimizing read/write performance.
-
Region:
- In TiKV, data is logically divided into multiple Regions, with each Region responsible for storing a portion of the Key Range.
- Region is the smallest unit scheduled by PD (Placement Driver). PD is responsible for the allocation and scheduling of Regions to ensure balanced data distribution.
- The size of a Region can be configured. When the amount of data in a Region is too large or too small, PD will automatically perform split or merge operations based on the configuration to maintain efficient system operation.
They are all collections of key-value pairs.
SST is a physical concept, while Region is a logical concept. In TiKV, it is a one-to-many relationship.
Teachers, does each region have a corresponding SST during backup?
SST files are the data file format stored in TiKV, while Regions are the units for data partitioning and management. Together, they form the foundation of TiDB’s distributed storage and query system.
Yes, they correspond to each other. Each SST file name includes a regionID.
@all Thank you all for your replies.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.