Auto_random is not supported when the primary key type is nonclustered?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 当主键类型为nonclustered时,不支持auto_random?

| username: lindoubled

The document is as follows:


auto_random became a formal feature in 4.0.3, but 4.0.3 should not have clustered primary keys yet. Does this mean that auto_random is not supported in version 4.0?

| username: 啦啦啦啦啦 | Original post link

4.0 supports it, but 4.0 does not yet have the concept of clustered index tables.

| username: xfworld | Original post link

Just upgrade the version, not only does it fix many bugs, but it also adds some stable features. Isn’t that great?

| username: h5n1 | Original post link

The primary key of a non-clustered table with bigint is independent of the table, similar to Oracle’s heap table and primary key, where the rowid is automatically generated by the system. A clustered table has the primary key and the table together, similar to MySQL. Before the appearance of clustered tables in TiDB, only primary keys of type int were supported as ‘clustered tables’. After the appearance of clustered tables, other data types are supported.

| username: lindoubled | Original post link

It looks like there is an error in the documentation?

| username: 啦啦啦啦啦 | Original post link

No problem, just choose the corresponding version of the documentation for different versions.

| username: h5n1 | Original post link

First, understand the storage formats of clustered and non-clustered tables and indexes, then understand the requirements of auto_random, and this issue will be clear.

| username: lindoubled | Original post link

– The question is why version 4.0 allows using auto_random with non-clustered indexes, but version 6.0 does not?

Version 4.0 does not have clustered indexes.

| username: h5n1 | Original post link

  1. auto random requires a bigint primary key
  2. In version 4.0, there is no concept of clustered index, but in practice, a table with a bigint primary key is an index
  3. After introducing the concept of clustered index, if a table with a bigint primary key is specified as nonclustered, then the primary key index and the table are two separate objects; if specified as clustered, then the clustered primary key index and the table are one
| username: lindoubled | Original post link

What is the relationship between auto_random and noncluster and cluster?

| username: lindoubled | Original post link

If version 6.0 uses non-cluster, then the behavior should be the same as version 4.0?

| username: h5n1 | Original post link

The relationship is that you should take a screenshot and look at that passage. Can you understand the structure of clustered indexes in MySQL?

| username: lindoubled | Original post link

What I understand is that auto_random is an ID generation algorithm. Actually, the ID type can be set to either int or bigint, but the IDs generated by auto_random are usually very large. If you use the int type, it might exceed its limit, so it is recommended to use the bigint type.

But what does this have to do with whether the table is nonclustered or clustered? Why can’t auto_random be set for nonclustered tables in version 6.0?

One is a table attribute, and the other is an ID generation algorithm. Is there a connection between the two?

| username: lindoubled | Original post link

Noncluster means there is an additional index. In version 4.0, the storage should all be noncluster, right?

| username: h5n1 | Original post link

  1. The requirement for bigint is indeed related to the range.
  2. Generally speaking, auto_random is just generating a random number and has nothing to do with whether it is a cluster or not. However, in TiDB, the data range is divided into regions according to the key range. The format of the key is t_(table_id)r(rowid). If the primary key is of int type, then the primary key is rowid. If it is not an int type primary key or a non-clustered table, then rowid is an automatically assigned incremental value by the system. The purpose of auto_random is to write data into different region ranges by generating random numbers. If it is a non-clustered table or a non-int primary key, since rowid is an incremental value rather than a random value generated by auto_random, even if auto_random generates random numbers, the actual data will always be written to the last region, causing a hotspot.
| username: lindoubled | Original post link

Thank you, thank you, I have some ideas.

But there’s still a question: how does version 4.0 use auto_random to scatter hotspots? Or is the non-clustered storage format in version 4.0 different from version 6.0?

| username: h5n1 | Original post link

The implementation of auto_random is the same, and there is no concept of clustered index in version 4.0. After the introduction of cluster, non-int columns can also be set as cluster tables.

| username: lindoubled | Original post link

The documentation says that it became an official feature in version 4.0.3. Does that mean the cluster table feature started in version 4.0.3?

| username: h5n1 | Original post link

Here is what you need to understand first:

  1. TiDB region/key format, you can refer to the official website and read the article on computing.
  2. The storage format differences between clustered and non-clustered tables. It is recommended to understand Oracle’s heap tables and IOT tables for comparison.
  3. Compare the storage format of bigint primary key in version 4.0 with the format of bigint clustered tables to see if there are any differences.
| username: lindoubled | Original post link

You are right. In version 4.0, if the primary key is set to an integer type, the table is clustered; for other types, it is non-clustered.

In version 5.0, you can set other types of clustered tables.