Why is the maximum size of a Region set to 96M?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Region为什么要设置最大大小为96M

| username: vincentLi

After reading “TiDB in Action,” I understand that data from multiple tables is written into RocksDB, which means they share a memtable size of 128MB (the flush size). This implies that a table’s Region is just a logical concept and data is not physically stored according to Regions (although, due to the key structure being based on table ID + row ID, data from the same table and Region will be sorted together). So, what is the significance of setting the Region size to not exceed 96MB? Is it just to create multiple write points by splitting Regions? If it’s just for multiple write points, why not specify Regions when creating the table? Additionally, wouldn’t it be better to trigger Region splits based on row count thresholds? Since data from a single Region does not form an independent file, I feel that calculating its space usage is challenging (although I still don’t know how TiKV calculates that a Region has used 96MiB of space). I’m thinking about this from an application development perspective and feel a bit puzzled.

| username: zhaokede | Original post link

It may be based on a comprehensive consideration of factors such as the balance between performance and resource overhead, scalability and flexibility, and constraints.

| username: 大飞哥online | Original post link

  1. Performance Balance: Appropriately sized Regions can balance data distribution and query performance. Too small Regions can lead to an excessive number of Regions, increasing resource consumption and performance degradation; too large Regions may result in unstable performance and decreased query performance.
  2. Resource Consumption: Too large Regions may lead to excessive resource consumption, affecting TiKV’s performance.
  3. Region Scheduling: Too large Regions may slow down Region scheduling, impacting data balance and performance.
| username: 大飞哥online | Original post link

96 MiB is a recommended default value. You can adjust the Region size based on actual conditions and needs. According to the documentation, the recommended range for Region size is [48MiB, 258MiB], with commonly used sizes including 96 MiB, 128 MiB, and 256 MiB. Avoid setting the Region size to exceed 1 GiB to prevent performance fluctuations and query performance degradation.

| username: zhanggame1 | Original post link

Region is a logical concept; RocksDB itself does not have this concept, nor is it used in data storage. A region is a unit used for scheduling in a distributed database.

| username: vincentLi | Original post link

These are all general statements. Is there any experimental data to show that setting this Region can improve system performance?

| username: vincentLi | Original post link

The reason I raised this question is that the situation of each table is different, and the division of Regions should be determined based on the specific circumstances of each table. For example, some tables are inserted once and then remain static with a small amount of data. In such cases, you don’t need to worry too much and can just place them in a single region. Other tables frequently undergo deletions and changes and require large amounts of data to be imported. In these cases, maintaining multiple regions is necessary to reflect the distributed nature. Isn’t it somewhat reckless to set this region configuration at the database level?

| username: wfxxh | Original post link

Even for a static table, if there is only one region, the read performance will be affected.

| username: TIDB-Learner | Original post link

The default size of a region is 96M. Having a region that is too large or too small is not ideal. There are cases of merging or splitting. There will be a range value that triggers the corresponding operation.

Theory is connected to practice, and there are many similar situations in real life.

| username: TiDBer_lBAxWjWQ | Original post link

It’s not good for a region to be too large or too small; it needs to be balanced. Other distributed systems are similar. For example, a chunk in MongoDB is 64MB.

| username: MrSylar | Original post link

It is definitely a “suggested” default value based on “trade-offs,” and there is also a “provision” for adjustments. This should be the approach when making a product: adjusting the region size “according to business needs,” as the database cannot cover all scenarios.

| username: zhaokede | Original post link

96M should be the default set value, not the initial value of a new region.

| username: xfworld | Original post link

96MB, based on network transmission optimization experience,

For a 1000MB network, the transmission rate is approximately 9.6 MB/s.
For a 10,000MB network, the transmission rate is approximately 96 MB/s,
which is exactly the size of a region. This is beneficial for scheduling and concurrency, as it does not exceed this defined size, avoiding physical network limitations and providing the best optimization solution.

Therefore, it is recommended to use dual 10,000MB network cards for production TiDB clusters.

For your reference!

| username: 呢莫不爱吃鱼 | Original post link

I see, learning from the expert~!

| username: gary | Original post link

Regarding why not specify the size of the Region or trigger Region splitting based on the number of rows when creating a table, the main considerations are as follows:
Flexibility: Dynamically adjusting the size and distribution of Regions can better adapt to changes in different workloads without needing to know the data characteristics and access patterns of each table in advance.
System-level optimization: Automatically managing the splitting and merging of Regions by the system can avoid requiring users to deeply understand the underlying storage details, making TiDB easier to use and maintain.
As for how to calculate the space occupied by a Region, TiKV achieves this by monitoring the metadata information of each Region (such as the storage space used). This includes tracking the write and delete operations of each Region, as well as the periodic garbage collection process. This information helps the system understand the current state of each Region in real-time and make decisions on splitting or merging accordingly.

| username: h5n1 | Original post link

Actually, it is a relatively balanced point found after multiple attempts.

The default size will soon be 1G.

| username: TiDBer_7S8XqKfl | Original post link

The default value of the system is 96M, and you can modify the parameters to change this default value.

| username: vincentLi | Original post link

The answers from all the experts have been very helpful to me, thank you all.

| username: TiDBer_7S8XqKfl | Original post link

96m is probably a suitable value found through continuous testing.

| username: xuqiang76 | Original post link

It should be considered as replication of the Raft protocol.