Questions about Clustered and Non-Clustered Tables

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于聚簇表和非聚簇表的问题

| username: TiDBer_Lee

[TiDB Version] v6.5.5
With the upgrade of the TiDB version, some concepts about clustered and non-clustered tables have become blurred during the use of the TiDB database. I have the following questions:

Regarding clustered tables:

  • The key-value pair of a clustered table is: primary key column data (key) - row data (value)
    • My question 1: If the primary key of the table is not an integer or the primary key is not a single column, is the key-value pair of the clustered table still like this? Is there a creation of tidb_rowid?
    • My question 2: How is the key-value pair of the secondary index of the clustered table composed? If the primary key is not an integer or the primary key is not a single column, will it change?

Regarding non-clustered tables:

  • The key-value pair of a non-clustered table:
    • tidb_rowid (key) - row data (value)
    • Primary key column data (key) - _tidb_rowid (value)
  • My question is, if the primary key of this non-clustered table is an integer, is the composition of the key-value pair in the secondary index of this table optimized to the primary key value or is it still tidb_rowid? According to the composition of the table data, tidb_rowid should be more efficient, but the document https://docs.pingcap.com/zh/tidb/stable/tidb-computing#索引数据和-key-value-的映射关系 does not distinguish between the two table types.
| username: zhanggame1 | Original post link

1: If the primary key of the table is not an integer or there is more than one primary key, in higher versions it is still a key-value pair, where the key is the primary key and the value is the data, without creating tidb_rowid.
2: The second one uses _tidb_rowid.

Testing
image

For non-clustered tables, the key used is _tidb_rowid.

| username: TiDBer_Lee | Original post link

For a clustered table’s secondary index, if the primary key is not an integer, is _tidb_rowid used?

| username: changpeng75 | Original post link

The index of a clustered table is unrelated to the rowid.
Whether it is a clustered table or a non-clustered table, the key of the index is the value of the index column, and the key value is the value of the key in the data key-value pair. For a clustered table, it is the primary key, and for a non-clustered table, it is the rowid.

| username: forever | Original post link

The difference between clustered and non-clustered tables is that the primary key of a clustered table is part of the Key in the KV mapping, while the Key of a non-clustered table is composed of the internally assigned _tidb_rowid by TiDB. Essentially, the primary key is a unique index, and only non-clustered tables have _tidb_rowid.

When creating a table, you can explicitly specify whether it is a clustered or non-clustered table. For statements that do not explicitly specify this keyword, the default behavior is influenced by the system variable @@global.tidb_enable_clustered_index. This variable has three values:

  • OFF means all primary keys use non-clustered indexes by default.
  • ON means all primary keys use clustered indexes by default.
  • INT_ONLY means the behavior is controlled by the configuration item alter-primary-key. If this configuration item is set to true, all primary keys use non-clustered indexes by default; if set to false, primary keys composed of a single integer type column use clustered indexes by default, while other types of primary keys use non-clustered indexes by default.
| username: zhanggame1 | Original post link

Clustered tables do not have _tidb_rowid, you can test it, it is not related to whether the primary key is an integer.

| username: TIDB-Learner | Original post link

Rowid is there by default, it’s just a matter of whether it’s used or not, and when it’s used.

| username: linnana | Original post link

Non-clustered tables have a hidden column called rowid regardless of whether they have a primary key.

| username: zhanggame1 | Original post link

Clustered tables do not have _tidb_rowid, you can try querying and it will report an error.

| username: FutureDB | Original post link

If the primary key of the clustered table is not an integer or the primary key is not a single column, the clustered table does not create _tidb_rowid. You can try creating a clustered table with a primary key that is not an integer or not a single column, and then query _tidb_rowid to see if it reports an error; I understand that at this time its key value has become its primary key value (currently there is no specific evidence to certify this, I don’t know if any experts can prove it :grinning:).

I understand the same as above, the RowID part of the key-value pairs of the secondary index of the clustered table also becomes the primary key value.

| username: dba远航 | Original post link

Clustered tables use primary keys, while non-clustered tables use tidb_rowid.

| username: 胡杨树旁 | Original post link

TiDB only has the concepts of clustered indexes and non-clustered indexes, right? It seems that there is no such term as secondary indexes.

| username: forever | Original post link

The indexes other than the primary key in a clustered table are secondary indexes.

| username: 这里介绍不了我 | Original post link

The wording is not important, it depends on how you understand it.

| username: zhanggame1 | Original post link

Of course, it’s important. Clustered table data is arranged in order by the primary key, and the primary key does not have a physical index. Secondary indexes have additional physical indexes.

| username: zhanggame1 | Original post link

TiDB has the concept of secondary indexes. Refer to the official documentation:


Creating Secondary Indexes | PingCAP Documentation Center

| username: 胡杨树旁 | Original post link

Non-clustered tables should also have secondary indexes for all indexes other than the primary key, right?

| username: 胡杨树旁 | Original post link

Can it be understood that clustered tables and non-clustered tables, apart from the primary key, are all secondary indexes?

| username: forever | Original post link

A clustered table is stored according to the primary key, and the secondary index points to the primary key’s key, hence it’s called a secondary index.
In a non-clustered table, data is stored as _tidb_rowid, and both the primary key and ordinary indexes point to _tidb_rowid, so it shouldn’t be called a secondary index.

| username: zhanggame1 | Original post link

Clustered tables have secondary indexes except for the primary key.
For non-clustered tables, I believe they are all secondary indexes. The primary key is just a unique non-null index, and the primary key is hidden as _tidb_rowid.