Does Region split occur at the midpoint of the Region Key Range?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Region split 是不是从 Region Key Range 的中间位置二分分裂?

| username: Jellybean

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] all
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]

In a chat, an expert raised the above question. Based on the principle of asking when you don’t understand, I would like to ask the community experts here as well. When a Region reaches the scheduling threshold and needs to split, where does it start splitting, and how is it determined?

The first thing that comes to mind is splitting from the middle position of the Region Key Range, but is that really the principle?

| username: Jellybean | Original post link

  1. The official website Split Region 使用文档 | PingCAP 文档中心 mentions that manually setting PD policies can achieve uniform or non-uniform splitting.

  2. The source code example can be viewed here for the default method, but I can’t access GitHub temporarily without a VPN. Based on the documentation, you can also see some clues.
    TiKV 源码解析系列文章(二十)Region Split 源码解析 | PingCAP

split_check has the following types:

1. Check the total or approximate size of the Region, the code is located in size.rs.
2. Check whether the total or approximate number of keys in the Region exceeds the threshold, the code is located in key.rs.
3. Split based on the key range bisecting, the code is located in half.rs. Besides the PD-specified key for splitting mentioned above, this method is also triggered by PD. Currently, it can only be manually triggered through pd-ctl and tikv-ctl commands.
4. Split based on the table prefix of the key, the code is located in table.rs, and the configuration is turned off by default.

split_check mentions several methods here. ApplyDelegate::exec_batch_split is where the split is implemented:

1. Update the version of the original Region, and the epoch of the new Region inherits the epoch of the original Region.
2. If right_derive is true, the original Region splits to the right; if false, it splits to the left. Set the start key and end key for each Region accordingly.
3. For each newly split Region, call write_peer_state and write_initial_apply_state to create metadata.

It seems the answer lies in ApplyDelegate::exec_batch_split.

| username: tidb菜鸟一只 | Original post link

By default, it splits from the middle. I have tested it before.

| username: Jellybean | Original post link

Looking at the source code analysis article above, due to mechanisms such as leader lease agreements and timeout design, it can be ensured that after the cut, both leaders are still in the original TiKV store by default. Later, if necessary, balanced scheduling can be carried out through the cluster’s scheduling strategy.

| username: dba远航 | Original post link

Binary search

| username: zhanggame1 | Original post link

Is it reasonable for the auto-increment primary key to split from the middle?

| username: 有猫万事足 | Original post link

I’m really not familiar with Rust, and looking at the code just leaves me confused. :sweat_smile:
I need to learn Rust as soon as possible.

| username: TiDBer_小阿飞 | Original post link

What’s the point of studying the principle of binary search? As a newbie, I’m totally confused. The split point is the first row key of the most central block in the largest file of the largest store in the entire region. Then it’s split based on this value, right?

| username: chenhanneu | Original post link

The splitting of Regions always starts as close to the middle as possible. There are two strategies for selecting this position: scan and approximate. The difference between them is that the former determines the middle key by scanning the Region, while the latter obtains an approximate position by looking at the statistical information recorded in the SST file. Generally speaking, the former is more accurate, while the latter consumes less I/O and can be completed faster.
pd-ctl:

operator add split-region 1 --policy=approximate // Split Region 1 into two Regions based on approximate values
operator add split-region 1 --policy=scan // Split Region 1 into two Regions based on precise scan values

| username: 随缘天空 | Original post link

Yes, it should split from the StartKey of the most central data in the entire Region.

| username: Jellybean | Original post link

If you manually add an operator, PD will definitely split according to the way we set it up. Everyone should have no problem understanding this part.

The main uncertainty is the Default method TiKV uses for splitting Regions.

It mentions here that “The splitting and merging of Regions are managed by the PD scheduler, which decides whether to split or merge Regions based on metrics such as Region size, data volume, and access frequency, and chooses appropriate split or merge points.”

| username: forever | Original post link

Have you tested it, for example, 1-10, 1-5 has very few or only one, most values are in 5-10, will it still split from the middle or will it evaluate the number of values?

| username: Billmay表妹 | Original post link

Hahaha~ It looks like you all have looked at the source code. That’s awesome~

| username: Jellybean | Original post link

Mainly, Teacher Bao’s question is both simple and soul-stirring, sparking a sense of curiosity and exploration. :grinning:

| username: forever | Original post link

If the gap between values is large enough and most of the new values are at the end, then the binary search will keep splitting.

| username: 随缘天空 | Original post link

It shouldn’t be infinitely large. Each file should have a maximum limit, so the startKey and endKey of each region should have a fixed range. Once it exceeds this range, it should automatically split the largest region from the middle. Do you know about HBase? The principle is basically similar to TiDB. The default maximum size for a single region in HBase is 10GB of storage.

| username: forever | Original post link

It seems like what you’re talking about is not the same issue I’m addressing. What I’m saying is, for example, if the region split is already satisfied, and there are only 10 values between 100000-160000 and 50000 values between 160000-200000, if the split is based on the middle of the key, then one region will have 10 values and the other will have 50000 values. Will the region with 50000 values still be split? If the gap is large enough, will it keep splitting?

| username: 随缘天空 | Original post link

Splitting is based on the file size of the region and has nothing to do with the amount of data. Officially, a single region is split when it exceeds 144M. It won’t be a situation where a region is 1G and then split multiple times. Therefore, there won’t be a case where the gap is large enough to keep splitting; it will automatically be handled once it reaches a certain size.

| username: 随缘天空 | Original post link

Regions can be split and merged. If the situation you mentioned exists, for example, there are currently three regions with ranges of 1-100, 100-200, and 200-300. However, the 1-100 range might store very little, and the 200-300 range might be the same, with only the 100-200 range storing a significant amount of data. This situation is a hotspot issue. But when the storage in the 100-200 range becomes too large, exceeding the size of a single region file, it may be split into multiple regions. It won’t be repeatedly split to an excessive extent. Additionally, other regions that are too small may be merged into new regions.

| username: 像风一样的男子 | Original post link

The size of the new Region after splitting = Maximum Region capacity space / 2 * 3
region-max-keys = region-split-keys / 2 * 3
Isn’t the default split at two-thirds?