How to Convert VARCHAR Clustered Primary Key to Region's Start_Key

translator_bot · June 22, 2024, 10:55pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: varchar类型聚簇主键与region的start_key怎么换算

| username: BossChao

[TiDB Usage Environment] POC

[Encountered Issues: Problem Phenomenon and Impact]
The primary keys in the existing system are all UUIDs. It was found that the largest letter is ‘f’ and the smallest number is ‘0’. I performed a split region on the table using the minimum value 000…000 and the maximum value fff…fff, but found that it did not match the region’s start_key.

[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]

translator_bot · June 22, 2024, 10:55pm

| username: tidb狂热爱好者 | Original post link

There is also a hidden table ID and its corresponding one.

translator_bot · June 22, 2024, 10:55pm

| username: BossChao | Original post link

Clustered primary key tables do not have _tidb_rowid. The hidden primary key column is of type bigint and is consistent with the key of the region.

translator_bot · June 22, 2024, 10:55pm

| username: tidb狂热爱好者 | Original post link

This is a varchar.

translator_bot · June 22, 2024, 10:55pm

| username: 胡杨树旁 | Original post link

Clustered index table, with the primary key as part of the key value, the primary key will be arranged in descending order, right? It seems that you can extract the UUID and arrange it in ascending order to see if it matches the region’s start_key.

translator_bot · June 22, 2024, 10:55pm

| username: 裤衩儿飞上天 | Original post link

The primary key field id is varchar.

translator_bot · June 22, 2024, 10:55pm

| username: 我是咖啡哥 | Original post link

Reference documentation:

Uniform Splitting

Since row_id is an integer, the keys to be split can be deduced based on the specified lower_value, upper_value, and region_num. TiDB first calculates the step (step = (upper_value - lower_value)/region_num), then splits the key at each step interval between lower_value and upper_value, ultimately creating region_num Regions.

For example, for table t, if you want to evenly split 16 Regions between minInt64 and maxInt64, you can use the following statement:

SPLIT TABLE t BETWEEN (-9223372036854775808) AND (9223372036854775807) REGIONS 16;

This statement will evenly split table t into 16 Regions between minInt64 and maxInt64. If the range of the primary key is known to be smaller, for example, only between 0 and 1000000000, you can use 0 and 1000000000 instead of the above minInt64 and maxInt64 to split the Regions.

SPLIT TABLE t BETWEEN (0) AND (1000000000) REGIONS 16;

Non-uniform Splitting

If the data is known to be unevenly distributed, for example, if you want to split one Region from -inf to 10000, another from 10000 to 90000, and another from 90000 to +inf, you can manually specify the points to split the Regions, as shown below:

SPLIT TABLE t BY (10000), (90000);

Everyone kept emphasizing varchar, so I thought varchar couldn’t be split. I specifically tested it, and it can be done.

root@127.0.0.1:4000[hqh]>show create table t1;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                           |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id` varchar(64) NOT NULL,
  `intkey` int(11) NOT NULL,
  `pad1` varbinary(1024) DEFAULT NULL,
  KEY `intkey` (`intkey`),
  PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>show table t1 regions;
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
|      1003 | t_110_    |         |      1004 |               1 | 1004  |          0 |          2700 |       1355 |                    1 |                0 |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
1 row in set (0.03 sec)
root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>SPLIT TABLE t1 BY (10000), (90000);
+--------------------+----------------------+
| TOTAL_SPLIT_REGION | SCATTER_FINISH_RATIO |
+--------------------+----------------------+
|                  2 |                    1 |
+--------------------+----------------------+
1 row in set (0.01 sec)

root@127.0.0.1:4000[hqh]>
root@127.0.0.1:4000[hqh]>show table t1 regions;
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY     | END_KEY       | LEADER_ID | LEADER_STORE_ID | PEERS  | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
|    108003 | t_110_        | t_110_r_10000 |    108004 |               1 | 108004 |          0 |             0 |          0 |                    1 |                0 |
|    108005 | t_110_r_10000 | t_110_r_90000 |    108006 |               1 | 108006 |          0 |             0 |          0 |                    1 |                0 |
|      1003 | t_110_r_90000 |               |      1004 |               1 | 1004   |          0 |             0 |          0 |                    1 |                0 |
+-----------+---------------+---------------+-----------+-----------------+--------+------------+---------------+------------+----------------------+------------------+
3 rows in set (0.01 sec)

translator_bot · June 22, 2024, 10:55pm

| username: 胡杨树旁 | Original post link

Looking at his table structure, the primary key is a clustered index. If it is a clustered index, there should be no row_id, right?

translator_bot · June 22, 2024, 10:55pm

| username: 我是咖啡哥 | Original post link

You can split

root@1:4000[tidbdemo]>show create table hot_word\G
*************************** 1. row ***************************
       Table: hot_word
Create Table: CREATE TABLE `hot_word` (
  `id` varchar(64) NOT NULL,
  `word` varchar(1024) NOT NULL DEFAULT '',
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

root@1:4000[tidbdemo]>show table hot_word regions;
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY | END_KEY | LEADER_ID | LEADER_STORE_ID | PEERS | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
|         2 | t_188_    |         |         3 |               1 | 3     |          0 |             0 |          0 |                    1 |                0 |
+-----------+-----------+---------+-----------+-----------------+-------+------------+---------------+------------+----------------------+------------------+
1 row in set (0.02 sec)

root@1:4000[tidbdemo]>SPLIT TABLE hot_word BETWEEN (-9223372036854775808) AND (9223372036854775807) REGIONS 4;

+--------------------+----------------------+
| TOTAL_SPLIT_REGION | SCATTER_FINISH_RATIO |
+--------------------+----------------------+
|                  3 |                    1 |
+--------------------+----------------------+
1 row in set (2.18 sec)

root@1:4000[tidbdemo]>
root@1:4000[tidbdemo]>show table hot_word regions;
+-----------+----------------------------+----------------------------+-----------+-----------------+---------+------------+---------------+------------+----------------------+------------------+
| REGION_ID | START_KEY                  | END_KEY                    | LEADER_ID | LEADER_STORE_ID | PEERS   | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS |
+-----------+----------------------------+----------------------------+-----------+-----------------+---------+------------+---------------+------------+----------------------+------------------+
|   2556003 | t_188_                     | t_188_r_0130377232733435f1 |   2556004 |               1 | 2556004 |          0 |             0 |          0 |                    1 |                0 |
|   2556005 | t_188_r_0130377232733435f1 | t_188_r_013335b232b33534b0 |   2556006 |               1 | 2556006 |          0 |             0 |          0 |                    1 |                0 |
|   2556007 | t_188_r_013335b232b33534b0 | t_188_r_013633f232f336336f |   2556008 |               1 | 2556008 |          0 |             0 |          0 |                    1 |                0 |
|         2 | t_188_r_013633f232f336336f |                            |         3 |               1 | 3       |          0 |             0 |          0 |                    1 |                0 |
+-----------+----------------------------+----------------------------+-----------+-----------------+---------+------------+---------------+------------+----------------------+------------------+
4 rows in set (0.10 sec)