Index Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 索引问题

| username: rw12306

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] V5.2
[Reproduction Path]
When using UUID as the primary key, does the clustered index become invalid, or will an implicit auto-increment rowid be used to maintain the primary key?

Does this sentence mean that using a clustered index can support both strings and integers?

| username: xfworld | Original post link

Yes, versions after 5.X all support it.

Previous versions were more friendly to bigint… In fact, to meet the requirements of data sharding, UUID would be more suitable.
bigint also has many compatibility treatments, such as the auto_random method…

| username: ealam_小羽 | Original post link

rowid is an identifier for row records. If the primary key is an integer, the primary key will be used as the rowid.

| username: rw12306 | Original post link

Okay, using UUID means maintaining a rowid. I thought it was always incrementing, but now I see it is unique within the table.

Another question, I previously used UUIDs without a clustered index (it was version 4.0 before, and I recently upgraded to 6.1; v4.0 does not support clustered indexes). How can I change it to a clustered index now? The official documentation says modification is not supported.

| username: h5n1 | Original post link

A clustered index is the primary key, and there won’t be an additional primary key. Only tables with non-clustered indexes have an additional rowid.

| username: rw12306 | Original post link

Isn’t it mentioned here that a non-integer primary key will use an implicit auto-increment rowid? Or does it mean that an implicit auto-increment rowid will only be used if I explicitly set SHARD_ROW_ID_BITS when creating the table? If SHARD_ROW_ID_BITS is not explicitly set, the primary key will be used directly (if the primary key is a UUID, then the UUID will be used).

| username: h5n1 | Original post link

This was the situation before clustered tables were introduced:

create table t(id varchar(10), name varchar(10), primary key (id));
insert into t values ('10', 'a');
[db16:4000]> select _tidb_rowid, t.* from t;
+-------------+----+------+
| _tidb_rowid | id | name |
+-------------+----+------+
|           1 | 10 | a    |
+-------------+----+------+
create table t1 (id varchar(10), name varchar(10), primary key (id) **clustered**);
insert into t1 values ('10', 'b');
[db16:4000]> select _tidb_rowid, t1.* from t1;
ERROR 1054 (42S22): Unknown column '_tidb_rowid' in 'field list'
| username: rw12306 | Original post link

It means that as long as there is a primary key, rowid will not be used.

| username: h5n1 | Original post link

Before clustered indexes were introduced, a rowid was added by default to all primary keys except for those of the Int type. After the introduction of clustered indexes, as long as it is specified as clustered, no rowid is added.

| username: rw12306 | Original post link

So, is the primary key in my table a clustered index? I still have a rowid, but when querying data, I can retrieve all data using the primary key id. Previously, I didn’t specify clustered, but there was no table-back operation when querying data. This primary key should be a clustered index, right?

It’s obvious that this is using a clustered index. The primary key of this table is FILE_NUMBER.

| username: h5n1 | Original post link

The ID is of character type, and if cluster index is not specified, the primary key is an independent index. PointGet means accessing through the primary key or unique key.

| username: rw12306 | Original post link

Do primary keys and unique keys also store row data? What is the difference between primary keys and clustered indexes in storing data? Do primary keys store row data?
Another question is, if I currently don’t have a clustered index, how can I make it a clustered index? The official website says modification is not supported.

| username: h5n1 | Original post link

| username: rw12306 | Original post link

  1. If it is a clustered index, find the corresponding Value through tablePrefix{TableID}_recordPrefixSep{index value}.
    Key: tablePrefix{TableID}_recordPrefixSep{RowID}
    Value: [col1, col2, col3, col4]
  2. If it is a primary key index or a unique index, find the RowID in the Value through tablePrefix{tableID}_indexPrefixSep{indexID}_index value, and then use RowID to find the corresponding Value through tablePrefix{TableID}_recordPrefixSep{RowID}.
    Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue
    Value: RowID
  3. If it is a normal secondary index, it matches the range of {RowID} through tablePrefix{TableID}_indexPrefixSep{IndexID}_index value, and then queries the value through RowID using tablePrefix{TableID}_recordPrefixSep{RowID}.
    Key: tablePrefix{TableID}_indexPrefixSep{IndexID}indexedColumnsValue{RowID}
    Value: null

I am not sure if my understanding is correct…

| username: h5n1 | Original post link

Basically, that’s it.

| username: rw12306 | Original post link

How can I make the primary key of the table a clustered index now? Do I need to delete the table and recreate it?

| username: h5n1 | Original post link

You can only rebuild it.