auto_random became a formal feature in 4.0.3, but 4.0.3 should not have clustered primary keys yet. Does this mean that auto_random is not supported in version 4.0?
The primary key of a non-clustered table with bigint is independent of the table, similar to Oracle’s heap table and primary key, where the rowid is automatically generated by the system. A clustered table has the primary key and the table together, similar to MySQL. Before the appearance of clustered tables in TiDB, only primary keys of type int were supported as ‘clustered tables’. After the appearance of clustered tables, other data types are supported.
First, understand the storage formats of clustered and non-clustered tables and indexes, then understand the requirements of auto_random, and this issue will be clear.
In version 4.0, there is no concept of clustered index, but in practice, a table with a bigint primary key is an index
After introducing the concept of clustered index, if a table with a bigint primary key is specified as nonclustered, then the primary key index and the table are two separate objects; if specified as clustered, then the clustered primary key index and the table are one
What I understand is that auto_random is an ID generation algorithm. Actually, the ID type can be set to either int or bigint, but the IDs generated by auto_random are usually very large. If you use the int type, it might exceed its limit, so it is recommended to use the bigint type.
But what does this have to do with whether the table is nonclustered or clustered? Why can’t auto_random be set for nonclustered tables in version 6.0?
One is a table attribute, and the other is an ID generation algorithm. Is there a connection between the two?
The requirement for bigint is indeed related to the range.
Generally speaking, auto_random is just generating a random number and has nothing to do with whether it is a cluster or not. However, in TiDB, the data range is divided into regions according to the key range. The format of the key is t_(table_id)r(rowid). If the primary key is of int type, then the primary key is rowid. If it is not an int type primary key or a non-clustered table, then rowid is an automatically assigned incremental value by the system. The purpose of auto_random is to write data into different region ranges by generating random numbers. If it is a non-clustered table or a non-int primary key, since rowid is an incremental value rather than a random value generated by auto_random, even if auto_random generates random numbers, the actual data will always be written to the last region, causing a hotspot.
But there’s still a question: how does version 4.0 use auto_random to scatter hotspots? Or is the non-clustered storage format in version 4.0 different from version 6.0?
The implementation of auto_random is the same, and there is no concept of clustered index in version 4.0. After the introduction of cluster, non-int columns can also be set as cluster tables.
TiDB region/key format, you can refer to the official website and read the article on computing.
The storage format differences between clustered and non-clustered tables. It is recommended to understand Oracle’s heap tables and IOT tables for comparison.
Compare the storage format of bigint primary key in version 4.0 with the format of bigint clustered tables to see if there are any differences.