Can non-int type fields in TiDB be used as primary keys for clustered index tables?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 非int 类型的字段是否可以当聚簇索引表的主键

| username: Raymond

In the DEV COURSE 3: Unique Features and Transaction Control of TiDB, it is mentioned that if the primary key is not of the int data type, TiDB will create a non-clustered index for the table. However, based on my tests, when creating a table with create table t1(id varchar(10) primary key, a varchar(10)); and setting the primary key field as varchar, checking with show create table t1 shows that it is a clustered index table. I am not sure if the course content is incorrect?

| username: ddhe9527 | Original post link

When creating a table, if the primary key is not explicitly specified as CLUSTERED, whether it is a clustered table is determined by the system variable tidb_enable_clustered_index. Its scope is SESSION/GLOBAL, and the default value is INT_ONLY.

  • INT_ONLY: If the primary key consists of a single integer type field, it is an IOT table; other types are non-IOT tables.

  • OFF: Non-IOT table

  • ON: Always an IOT table

| username: Raymond | Original post link

Thank you for the reply. So, there is a parameter that controls whether non-int fields can serve as primary keys.

| username: jansu-dev | Original post link

  1. After version v5, yes (there is a parameter that controls whether non-int fields can serve as primary keys), but it’s not rigorous. It should be that regardless of whether it is an int, it can be a primary key, but the table with the primary key may or may not be a clustered table. Clustered tables have better performance and data row positioning capabilities.
  2. This issue actually traces back to the evolution of TiDB clustered tables. In v4 and earlier versions, there was no such parameter, meaning: int primary keys were clustered tables, and others were non-clustered tables. This was determined by the key encoding format in the data storage model.
  3. However, as evolution progressed, non-clustered tables exhibited certain performance issues, often requiring table lookups (finding row_id from the index and then looking up the table, which increases network consumption and logical steps). This could have been directly resolved with point queries (significantly improving scenarios that emphasize TP). In v5, TiDB supported the functionality of clustered tables with other types as primary keys. To maintain compatibility with previous versions, this parameter was introduced.