TiDB Optimization

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB优化

| username: TiDBer_CkS2lbTx

In TiDB, specifying the primary key type as AUTO_RANDOM when creating a clustered index table can scatter regions to prevent write hotspots. However, in my business scenario, the development requires using randomly generated numbers from the Snowflake algorithm as the primary key. Will these randomly generated numbers also automatically distribute across different regions in TiDB, or will they still fill up one region before writing to the next?

| username: 路在何chu | Original post link

Check his table.

| username: 像风一样的男子 | Original post link

In your scenario, you can’t use AUTO_RANDOM type primary keys, so the data cannot be automatically distributed across different regions.

| username: residentevil | Original post link

I remember there is an SQL command to check which REGIONS the current table is distributed on.

| username: redgame | Original post link

Automatic distribution, but it needs to be tested.

| username: TiDBer_aaO4sU46 | Original post link

Yes, it will automatically distribute.

| username: TiDBer_CkS2lbTx | Original post link

Which SQL statement can you show me?

| username: chenhanneu | Original post link

First, a region will be written, then split, and split again. After the Snowflake algorithm, writes and queries will automatically be distributed across multiple regions. With auto-increment, even after splitting, writes will still only occur within a single region.

| username: TiDBer_RjzUpGDL | Original post link

It won’t distribute automatically, right?

| username: tidb菜鸟一只 | Original post link

In TiDB, when creating a clustered index table and specifying the primary key type as AUTO_RANDOM, the randomly generated numbers using the Snowflake algorithm will automatically be distributed across different regions in TiDB. However, initially, when there is only one region, this region will definitely be on a single node. As the data volume increases, it will gradually split into multiple regions. If you want to avoid write hotspots from the beginning, you can pre-split the table. Please refer to the steps here: Split Region 使用文档 | PingCAP 文档中心

For example, SPLIT TABLE t BETWEEN (0) AND (1000000000) REGIONS 16;

At this point, your randomly generated Snowflake ID combined with your table_id becomes your key. Due to the randomness of the Snowflake ID, it will naturally fall into the multiple pre-split regions, thus avoiding write hotspots.

| username: 江湖故人 | Original post link

The key of TiKV contains the primary key Key=tablePrefix{tableID}_recordPrefixSep{rowID}. It can be considered that in a clustered table with a primary key, the Region is sorted by the primary key of the table. Therefore, as long as the primary key is not continuous, there will be no write hotspot. The random numbers generated by the Snowflake algorithm and the auto-increment primary key using AUTO_RANDOM produce the same effect.

| username: 胡杨树旁 | Original post link

You can use non-clustered tables and scatter SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS when creating the table.

| username: 江湖故人 | Original post link

A non-clustered table will have an additional hidden column _tidb_rowid, which might be a bit wasteful :thinking:

| username: TiDBer_CkS2lbTx | Original post link

For non-clustered tables, even if there is a hidden primary key column, there will ultimately be an additional table lookup, right?

| username: forever | Original post link

The primary key itself is already dispersed. If a non-clustered table is used, _tidb_rowid will become a hotspot, and another solution will be needed to address the _tidb_rowid hotspot.

| username: forever | Original post link

Use this to pre-split the regions into multiple parts, combined with the Snowflake algorithm, and it will work. The role of Snowflake is the same as auto_random.

| username: zhanggame1 | Original post link

Adding a primary key to a non-clustered table will significantly reduce write performance.

| username: 小于同学 | Original post link

You can test it out.

| username: 小龙虾爱大龙虾 | Original post link

The Snowflake algorithm increases monotonically over time and cannot solve the hotspot issue. Check the heatmap after testing.

| username: TiDBer_rvITcue9 | Original post link

Study and learn.