Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: Region为什么要设置最大大小为96M
After reading “TiDB in Action,” I understand that data from multiple tables is written into RocksDB, which means they share a memtable size of 128MB (the flush size). This implies that a table’s Region is just a logical concept and data is not physically stored according to Regions (although, due to the key structure being based on table ID + row ID, data from the same table and Region will be sorted together). So, what is the significance of setting the Region size to not exceed 96MB? Is it just to create multiple write points by splitting Regions? If it’s just for multiple write points, why not specify Regions when creating the table? Additionally, wouldn’t it be better to trigger Region splits based on row count thresholds? Since data from a single Region does not form an independent file, I feel that calculating its space usage is challenging (although I still don’t know how TiKV calculates that a Region has used 96MiB of space). I’m thinking about this from an application development perspective and feel a bit puzzled.
It may be based on a comprehensive consideration of factors such as the balance between performance and resource overhead, scalability and flexibility, and constraints.
96 MiB is a recommended default value. You can adjust the Region size based on actual conditions and needs. According to the documentation, the recommended range for Region size is [48MiB, 258MiB], with commonly used sizes including 96 MiB, 128 MiB, and 256 MiB. Avoid setting the Region size to exceed 1 GiB to prevent performance fluctuations and query performance degradation.
Region is a logical concept; RocksDB itself does not have this concept, nor is it used in data storage. A region is a unit used for scheduling in a distributed database.
These are all general statements. Is there any experimental data to show that setting this Region can improve system performance?
The reason I raised this question is that the situation of each table is different, and the division of Regions should be determined based on the specific circumstances of each table. For example, some tables are inserted once and then remain static with a small amount of data. In such cases, you don’t need to worry too much and can just place them in a single region. Other tables frequently undergo deletions and changes and require large amounts of data to be imported. In these cases, maintaining multiple regions is necessary to reflect the distributed nature. Isn’t it somewhat reckless to set this region configuration at the database level?
Even for a static table, if there is only one region, the read performance will be affected.
The default size of a region is 96M. Having a region that is too large or too small is not ideal. There are cases of merging or splitting. There will be a range value that triggers the corresponding operation.
Theory is connected to practice, and there are many similar situations in real life.
It’s not good for a region to be too large or too small; it needs to be balanced. Other distributed systems are similar. For example, a chunk in MongoDB is 64MB.
It is definitely a “suggested” default value based on “trade-offs,” and there is also a “provision” for adjustments. This should be the approach when making a product: adjusting the region size “according to business needs,” as the database cannot cover all scenarios.
96M should be the default set value, not the initial value of a new region.
96MB, based on network transmission optimization experience,
For a 1000MB network, the transmission rate is approximately 9.6 MB/s.
For a 10,000MB network, the transmission rate is approximately 96 MB/s,
which is exactly the size of a region. This is beneficial for scheduling and concurrency, as it does not exceed this defined size, avoiding physical network limitations and providing the best optimization solution.
Therefore, it is recommended to use dual 10,000MB network cards for production TiDB clusters.
For your reference!
I see, learning from the expert~!
Regarding why not specify the size of the Region or trigger Region splitting based on the number of rows when creating a table, the main considerations are as follows:
Flexibility: Dynamically adjusting the size and distribution of Regions can better adapt to changes in different workloads without needing to know the data characteristics and access patterns of each table in advance.
System-level optimization: Automatically managing the splitting and merging of Regions by the system can avoid requiring users to deeply understand the underlying storage details, making TiDB easier to use and maintain.
As for how to calculate the space occupied by a Region, TiKV achieves this by monitoring the metadata information of each Region (such as the storage space used). This includes tracking the write and delete operations of each Region, as well as the periodic garbage collection process. This information helps the system understand the current state of each Region in real-time and make decisions on splitting or merging accordingly.
Actually, it is a relatively balanced point found after multiple attempts.
The default size will soon be 1G.
The default value of the system is 96M, and you can modify the parameters to change this default value.
The answers from all the experts have been very helpful to me, thank you all.
96m is probably a suitable value found through continuous testing.
It should be considered as replication of the Raft protocol.