After a table is created, is there only one region regardless of the number of TiKV nodes? It will only split as the amount of data increases, right?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 一个表建立之后,不管多少个 tikv 节点也只有一个region 吗? 只有随着数据量的增多,才会分裂吧?

| username: 数据库菜鸡

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
After a table is created, does it only have one region regardless of the number of TiKV nodes? In this case, is the read/write performance limited by the performance of a single region?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: 像风一样的男子 | Original post link

A region is a logical concept, similar to virtual memory, that divides the key space into segments. As data is continuously written, when the approximate size of the keys within a region reaches the maximum parameter constraint, it will split. The split is not caused by the write; rather, after the split, only some management metadata changes, which does not affect the underlying RocksDB. Additionally, when writing table data, index, lock, raft log, and other data also need to be written to the disk.

| username: 数据库菜鸡 | Original post link

Parallelism can only occur between different regions, right? One region corresponds to one raft group, and each raft group processes serially? Is that the correct understanding? Thanks.

| username: 像风一样的男子 | Original post link

Each Region is responsible for maintaining a segment of continuous data in the cluster (approximately 96 MiB by default). Each segment of data is stored in multiple replicas on different Stores (the default configuration is 3 replicas), and each replica is called a Peer. Multiple Peers of the same Region synchronize data through the raft protocol, so Peer is also used to refer to members in a raft instance. TiKV uses a multi-raft mode to manage data, meaning each Region corresponds to an independently running raft instance, which we also refer to as a Raft Group.

| username: 数据库菜鸡 | Original post link

If the keys of the data written by the user are relatively random, then initially there is only one region. As more data is written, it can become 100 regions. So, is the overall read and write performance when there are 100 regions 100 times that of when there was only one region initially?

| username: zhanggame1 | Original post link

If a region has three replicas when creating a new table, one region will appear on three TiKV nodes. You can see the specific distribution using show table xxx regions. As for how many regions there are after creating the table, it depends on some parameters.

| username: 像风一样的男子 | Original post link

Having more regions is not necessarily better. In scenarios with large amounts of data, an excessive number of regions may lead to increased resource overhead and performance degradation.

| username: 有猫万事足 | Original post link

The upper limit of throughput is constrained by the number of TiKV instances and disk throughput. It is not necessarily 100 times; the more evenly these 100 regions are distributed across TiKV instances, and the greater the disk throughput of each TiKV, the better the performance will be.

In my own tests, with each TiKV storing data independently, 4 TiKV instances increased write throughput by roughly 1/3 compared to 3 TiKV instances.

| username: forever | Original post link

You can use pre_split_regions to start evenly splitting regions after the table is successfully created.
Split Region Usage Documentation | PingCAP Documentation Center

| username: cassblanca | Original post link

After a table is created, will there only be one region regardless of the number of TiKV nodes? In this case, the read and write performance will be limited by the performance of a single region, right?

Data will be distributed across different Regions, with each TiKV node managing a portion of the Regions. A table may be distributed across multiple TiKV nodes. However, each Region has only one leader node, which is responsible for handling read and write requests for that Region. If a table has only one Region, then the read and write performance will be limited by that Region. When the load on the Region is too heavy, it will lead to performance degradation.