What is the purpose of shard bits in the auto_random field in a TiDB table?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 表中 auto_random 的字段的分片位有什么用?

| username: myzz

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.0
[Resource Configuration] 5 TiKV Nodes
[Attachment: Screenshot/Log/Monitoring]
What is the purpose of the shard bits in the auto_random field in a TiDB table?
I don’t quite understand its specific function. Can someone explain it?
Can it scatter the ID generation when data is inserted after the table is created? Does it ensure that data is evenly distributed across TiKV nodes? If so, why is this possible? Is it because the ID generation algorithm is influenced by it? Seeking help.

| username: tidb菜鸟一只 | Original post link

If you use AUTO_INCREMENT for auto-increment, the write hotspot may only be on one region. By using the sharding bits of the auto_random field, the write hotspot will be shared by multiple regions.
For more details, see here:
SHARD_ROW_ID_BITS | PingCAP Documentation Center

| username: myzz | Original post link

If I have 5 TiKV nodes here, should I set the shard bits to 2 or 3? If it’s 2, will one node have no load? What will happen if it’s 3?

| username: tidb菜鸟一只 | Original post link

Setting 2 or 3 is fine for 5 TiKV nodes.

| username: 考试没答案 | Original post link

There is a different perspective: although random uniqueness can solve the hotspot issue, the separation of regions is based on range. Does it improve overall performance? Is the impact of later merging significant? Will the generated random values be very large in a short period of time? For example, values with large spans like 1, 10000, 100000, 999999.

| username: forever | Original post link

Performance has improved, reducing hotspots. You only need to scatter data if you are sure there will be a large amount of data written, so merging is generally not necessary.

| username: myzz | Original post link

There are two places in this document that feel a bit ambiguous. Can you explain them, please?

| username: tidb菜鸟一只 | Original post link

There shouldn’t be any ambiguity, right? The first sentence means that if there are no multiple replicas, each region can only be stored on one node. The second sentence means that in reality, there are multiple replicas, so besides the leader of each region being stored on one node, the other followers are stored on other nodes.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.