Pre Split Region - Not Showing the details of the regions

I created a table called ‘instant_cdr_split_region’ with the following command:
CREATE TABLE instant_cdr_split_region (
ID bigint(20) NOT NULL AUTO_INCREMENT,
row_id int(11) NOT NULL,
msisdn varchar(12) NOT NULL,
date_id int(11) DEFAULT NULL,
channel varchar(12) DEFAULT NULL,
usage_type varchar(12) DEFAULT NULL,
ac_type varchar(15) DEFAULT NULL,
voice_amt int(11) DEFAULT NULL,
sms_amt int(11) DEFAULT NULL,
data_amt int(11) DEFAULT NULL,
field1 varchar(20) DEFAULT NULL,
field2 varchar(20) DEFAULT NULL,
field3 varchar(20) DEFAULT NULL,
field4 varchar(20) DEFAULT NULL,
field5 varchar(20) DEFAULT NULL,
field6 varchar(20) DEFAULT NULL,
field7 varchar(20) DEFAULT NULL,
field8 varchar(20) DEFAULT NULL,
field9 varchar(20) DEFAULT NULL,
field10 varchar(20) DEFAULT NULL,
field11 varchar(20) DEFAULT NULL,
field12 varchar(20) DEFAULT NULL,
field13 varchar(20) DEFAULT NULL,
field14 varchar(20) DEFAULT NULL,
field15 varchar(20) DEFAULT NULL,
field16 varchar(20) DEFAULT NULL,
field17 varchar(20) DEFAULT NULL,
field18 varchar(20) DEFAULT NULL,
field19 varchar(20) DEFAULT NULL,
field20 varchar(20) DEFAULT NULL,
s_date date NOT NULL,
date_idp int(11) DEFAULT NULL,
PRIMARY KEY (ID,row_id) NONCLUSTERED,
KEY msisdn_s_date_idx (msisdn,s_date)
) shard_row_id_bits = 8 pre_split_regions=3;
Before loading any records into the table, I checked the region details using the command ‘SHOW TABLE instant_cdr_split_region REGIONS;’ and was able to see 9 regions. However, after loading 100,000 records into the table,by the time I was able to see the 9 region details.And sometimes when I check the region details after an hour or so, I see only one record.All the execution results is pasted below. Is this a bug or did I miss any configuration?"
Can anyone please help on this ?
Note:I have enabled tidb_scatter_region variable
After Creating Table
mysql> SHOW TABLE instant_cdr_split_region REGIONS;
±----------±----------------------------±----------------------------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS | SCHEDULING_CONSTRAINTS | SCHEDULING_STATE |
±----------±----------------------------±----------------------------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
| 476753005 | t_137_i_2_ | t_137_r_1152921504606846976 | 476753007 | 194005 | 476753006, 476753007, 476753008 | 0 | 442 | 0 | 1 | 0 | | |
| 476753009 | t_137_r_1152921504606846976 | t_137_r_2305843009213693952 | 476753012 | 197001 | 476753010, 476753011, 476753012 | 0 | 39 | 0 | 1 | 0 | | |
| 476753013 | t_137_r_2305843009213693952 | t_137_r_3458764513820540928 | 476753014 | 1 | 476753014, 476753015, 476753016 | 0 | 0 | 0 | 1 | 0 | | |
| 476753017 | t_137_r_3458764513820540928 | t_137_r_4611686018427387904 | 476753019 | 194005 | 476753018, 476753019, 476753020 | 0 | 0 | 0 | 1 | 0 | | |
| 476753021 | t_137_r_4611686018427387904 | t_137_r_5764607523034234880 | 476753024 | 197001 | 476753022, 476753023, 476753024 | 0 | 27 | 0 | 1 | 0 | | |
| 476753025 | t_137_r_5764607523034234880 | t_137_r_6917529027641081856 | 476753026 | 1 | 476753026, 476753027, 476753028 | 0 | 0 | 0 | 1 | 0 | | |
| 476753029 | t_137_r_6917529027641081856 | t_137_r_8070450532247928832 | 476753032 | 197001 | 476753030, 476753031, 476753032 | 0 | 39 | 0 | 1 | 0 | | |
| 476749073 | t_137_r_8070450532247928832 | | 476749076 | 197001 | 476749074, 476749075, 476749076 | 0 | 1331 | 361346 | 1 | 124 | | |
| 476753037 | t_137_i_1_ | t_137_i_2_ | 476753038 | 1 | 476753038, 476753039, 476753040 | 0 | 39 | 0 | 1 | 0 | | |
±----------±----------------------------±----------------------------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
9 rows in set (0.04 sec)
After Loading 1 Lakh data
mysql> SHOW TABLE instant_cdr_split_region REGIONS;
±----------±----------------------------±----------------------------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS | SCHEDULING_CONSTRAINTS | SCHEDULING_STATE |
±----------±----------------------------±----------------------------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
| 476753005 | t_137_i_2_ | t_137_r_1152921504606846976 | 476753007 | 194005 | 476753006, 476753007, 476753008 | 0 | 0 | 0 | 17 | 1397 | | |
| 476753009 | t_137_r_1152921504606846976 | t_137_r_2305843009213693952 | 476753012 | 197001 | 476753010, 476753011, 476753012 | 0 | 39 | 0 | 1 | 0 | | |
| 476753013 | t_137_r_2305843009213693952 | t_137_r_3458764513820540928 | 476753014 | 1 | 476753014, 476753015, 476753016 | 0 | 0 | 0 | 1 | 0 | | |
| 476753017 | t_137_r_3458764513820540928 | t_137_r_4611686018427387904 | 476753019 | 194005 | 476753018, 476753019, 476753020 | 0 | 0 | 0 | 1 | 0 | | |
| 476753021 | t_137_r_4611686018427387904 | t_137_r_5764607523034234880 | 476753024 | 197001 | 476753022, 476753023, 476753024 | 0 | 27 | 0 | 1 | 0 | | |
| 476753025 | t_137_r_5764607523034234880 | t_137_r_6917529027641081856 | 476753026 | 1 | 476753026, 476753027, 476753028 | 0 | 0 | 0 | 1 | 0 | | |
| 476753029 | t_137_r_6917529027641081856 | t_137_r_8070450532247928832 | 476753032 | 197001 | 476753030, 476753031, 476753032 | 0 | 39 | 0 | 1 | 0 | | |
| 476749073 | t_137_r_8070450532247928832 | | 476749076 | 197001 | 476749074, 476749075, 476749076 | 0 | 0 | 218640 | 10 | 75224 | | |
| 476753037 | t_137_i_1_ | t_137_i_2_ | 476753038 | 1 | 476753038, 476753039, 476753040 | 0 | 0 | 0 | 24 | 0 | | |
±----------±----------------------------±----------------------------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
9 rows in set (0.03 sec)
Currently showing only one record
mysql> SHOW TABLE instant_cdr_split_region REGIONS;
±----------±----------±--------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS | SCHEDULING_CONSTRAINTS | SCHEDULING_STATE |
±----------±----------±--------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
| 476753037 | t_137_ | | 476753038 | 1 | 476753038, 476753039, 476753040 | 0 | 0 | 83120 | 53 | 309928 | | |
±----------±----------±--------±----------±----------------±--------------------------------±-----------±--------------±-----------±---------------------±-----------------±-----------------------±-----------------+
1 row in set (0.01 sec)

Based on the official documentation, it is expected that the region details may change after loading data into the table. When you create a table, TiDB automatically splits the table into multiple regions based on the region split size. The region split size is determined by the split-region-size configuration item. When you load data into the table, the region split size may change, which may cause the number of regions to change. Therefore, it is normal to see different region details before and after loading data into the table.

In addition, the region split size is not the only factor that affects the number of regions. The data distribution and the number of TiKV nodes also affect the number of regions. Therefore, it is possible to see different region details even if you load the same amount of data into the same table at different times.

If you want to control the number of regions, you can adjust the split-region-size configuration item or use the split table statement to manually split the table into regions. However, please note that manually splitting the table may cause hotspots and affect the performance of the cluster.

Thanks for the response @billmay
My concern is that I used the Pre Split Region logic to create the table, specifying shard_row_id_bits = 8 and pre_split_regions = 3, which should result in the table being created with 9 regions. However, when I query the table after loading data, sometimes I see that the regions have merged into one. This doesn’t make sense to me since the number of regions should not decrease, but rather increase automatically as new regions are added to accommodate large amounts of data.
Can you please comment on this ?

The pre split region will be re-merge into one if you don’t write any data to it. so could you select count(*) from instant_cdr_split_region; to see if the record are already written into the table?

Yes The data is present in the table
My concern is if pre-split regions are re-merged into one when no data is written to them, after sometimes it could lead to hot spot issues when loading large amounts of data to the table. Additionally, if the unused regions are re-merged, the table will revert to its default behavior of creating new regions only if the current region size exceeds a certain limit.
Can you please comment on this ?

According to the official documentation, pre-splitting regions can help to avoid hotspot problems caused by uneven data distribution. However, if the unused regions are re-merged, it may lead to hotspot issues when loading large amounts of data to the table. To avoid this, you can manually split the regions again to ensure even data distribution.

Regarding the default behavior of creating new regions, it is true that TiDB will create new regions only if the current region size exceeds a certain limit. This is because TiDB’s data scheduling unit is the region, which is a sorted key-value pair of a certain size (default 96MB). If a table is small and frequently accessed, it may cause hotspot problems even if it is pre-split. In this case, it is recommended to store the small table in memory cache like Redis or directly in the application’s memory.

In summary, pre-splitting regions can help to avoid hotspot problems caused by uneven data distribution, but it may not be a perfect solution in all cases. It is important to monitor the data distribution and adjust the region splitting strategy accordingly to ensure optimal performance.