How to Understand SHARD_ROW_ID_BITS

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: SHARD_ROW_ID_BITS怎么理解

| username: 江湖故人

Is SHARD_ROW_ID_BITS the 2^n number of buckets hashed by _tidb_rowid for non-clustered tables? Why is it still occupying only one region even though I set it to 4?

| username: 江湖故人 | Original post link

set @@cte_max_recursion_depth = 100000;
drop table t1;
CREATE TABLE t1 (c int, c2 varchar(100)) SHARD_ROW_ID_BITS = 4;
insert into t1
WITH RECURSIVE nr(n) AS
    (SELECT 1                               
      UNION ALL SELECT n + 1                
      FROM nr WHERE n < 100000            
    ) SELECT n, 'abcdefgabcdefgabcdefgabcdefgabcdefg' FROM nr;

This way, only one region is found:

SHOW TABLE t1 regions;
REGION_ID|START_KEY|END_KEY |LEADER_ID|LEADER_STORE_ID|PEERS       |SCATTERING|WRITTEN_BYTES|READ_BYTES|APPROXIMATE_SIZE(MB)|APPROXIMATE_KEYS|SCHEDULING_CONSTRAINTS|SCHEDULING_STATE|
---------+---------+--------+---------+---------------+------------+----------+-------------+----------+--------------------+----------------+----------------------+----------------+
       52|t_125_   |78000000|       53|              1|53, 174, 238|         0|     27112746|   1515420|                  71|          623656|                      |                |
      

select _tidb_rowid, c, c2 from t1; 
_tidb_rowid        |c  |c2                                 |
-------------------+---+-----------------------------------+
8070450532247928833|  1|abcdefgabcdefgabcdefgabcdefgabcdefg|
8070450532247928834|  2|abcdefgabcdefgabcdefgabcdefgabcdefg|
8070450532247928835|  3|abcdefgabcdefgabcdefgabcdefgabcdefg|
...
| username: 江湖故人 | Original post link

I found it. You also need to add the pre_split_regions attribute:

CREATE TABLE t1 (c int, c2 varchar(100)) 
SHARD_ROW_ID_BITS = 4 pre_split_regions=2;
| username: 哈喽沃德 | Original post link

The SHARD_ROW_ID_BITS parameter is used to define the number of bits in the ROW_ID that represent the shard ID, helping TiDB with the distribution and management of data in sharded tables.

| username: dba远航 | Original post link

The SHARD_ROW_ID_BITS parameter is used to define the number of bits in the ROW_ID that represent the shard ID. Note that it is a power of 2.

| username: xfworld | Original post link

Manual transmission,

Manually switch the sharding of region data to avoid hotspot issues…

| username: WinterLiu | Original post link

Got it, pre-split the table when creating it.

| username: wangccsy | Original post link

Just passing by to learn a bit.

| username: DBAER | Original post link

The main thing is to avoid writing hotspots. The business must be written sequentially, but after being scattered, the read performance will also decrease. This needs to be balanced by yourself.

| username: Soysauce520 | Original post link

To improve write performance and avoid the issue of writing to only one region, note that this can only be done during the table creation stage.

| username: yulei7633 | Original post link

Sure, please provide the Chinese text you need translated.

| username: 春风十里 | Original post link

Is it necessary to add this parameter? When the data volume is large, does SHARD_ROW_ID_BITS = 4 split the region according to 2 to the power of 4?

| username: 江湖故人 | Original post link

I inserted hundreds of thousands and it’s still on one region.

| username: 江湖故人 | Original post link

Thank you all for your replies :pray:

SHARD_ROW_ID_BITS is used to randomly scatter the row IDs generated by the _tidb_rowid column. PRE_SPLIT_REGIONS is used to pre-split regions after the table is created.

Example:

-- The value of PRE_SPLIT_REGIONS must be less than or equal to SHARD_ROW_ID_BITS.
create table t (a int, b int) SHARD_ROW_ID_BITS = 4 PRE_SPLIT_REGIONS=3;
  • SHARD_ROW_ID_BITS = 4 means that the values of tidb_rowid will be randomly distributed into 16 (16=2^4) range intervals.
  • PRE_SPLIT_REGIONS=3 means that 8 (2^3) Regions will be pre-split after the table is created.

After starting to write data into table t, the data will be written into the pre-split 8 Regions, thus avoiding the write hotspot issue that occurs when there is only one Region right after the table is created.

TiDB High Concurrency Write Best Practices | PingCAP Documentation Center

| username: FutureDB | Original post link

Yes, while this scattering avoids write hotspots and improves write performance to some extent, it significantly reduces read performance. This is the reason we have not used this parameter; it’s difficult to balance the pros and cons.

| username: lemonade010 | Original post link

Just a newbie passing by to learn.

| username: zhaokede | Original post link

Using AUTO_RANDOM, you can immediately see the data distributed across multiple regions.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.