In scenarios migrating from MySQL, is it recommended to use clustered tables or non-clustered tables for table creation, and what are the advantages and disadvantages of each?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 请问从MySQL迁移过来的场景,建表建议使用聚簇表还是非聚簇表,各有啥优缺点吗

| username: terry0219

If TiDB is migrated from MySQL, it is recommended to use non-clustered tables; if it is a new table, it is recommended to use clustered tables. Is this understanding correct?

| username: DBAER | Original post link

This has an introduction.

| username: tidb菜鸟一只 | Original post link

It depends on the business requirements and has little to do with whether it is migrated from MySQL.

| username: 友利奈绪 | Original post link

It generally depends on the requirements.

| username: mono | Original post link

This depends on your original table structure and business situation. However, if possible, it is recommended to use a clustered index table.

| username: DBAER | Original post link

This document has a comparison and is very detailed:

| username: 考试没答案 | Original post link

We migrated clustered tables. By default, they are also clustered tables. The characteristic of clustered tables is that they are ordered, which seems to have an advantage for key-value.

| username: zhanggame1 | Original post link

Clustered tables in TiDB’s underlying storage use a key-value format, where the key is the table ID plus the primary key value. This way, a separate primary key index is not needed.

Advantages: For scenarios with high write performance requirements, clustered tables have much higher write performance compared to non-clustered tables with a primary key index. Updating data with a primary key in the WHERE clause is also faster than with non-clustered tables.

MySQL itself also uses clustered tables, so it is recommended to use clustered tables when migrating. Currently, non-clustered tables seem to have little significance, and TiDB does not use them by default.

| username: TiDBer_QYr0vohO | Original post link

Recommend clustered tables

| username: Hacker_PtIIxHC1 | Original post link

Learned.

| username: stephanie | Original post link

It still depends on the business requirements. Generally, it is recommended to create clustered tables if possible.

| username: terry0219 | Original post link

If the migrated table has a primary key ID with an auto-increment attribute, and the primary key also has business attributes, how can the write hotspot issue be resolved?

| username: terry0219 | Original post link

If the migrated table has a primary key ID with an auto-increment attribute, and the primary key also has business attributes, what is the solution to the write hotspot issue?

| username: tidb菜鸟一只 | Original post link

Has it already been migrated to a clustered table? If it is a non-clustered table, you can use the SHARD_ROW_ID_BITS parameter to scatter the data and alleviate the write hotspot issue.

| username: terry0219 | Original post link

Yes, I want to use clustered tables. In this scenario, to solve the write hotspot issue, it seems there aren’t many good methods at the database level, right?

| username: tidb菜鸟一只 | Original post link

Clustered tables with auto-increment primary keys cannot avoid write hotspots if they cannot be changed to auto_random. This is because their keys (RowID) are in the format tablePrefix{ TableID }_recordPrefixSep{ Col1 }, which means the keys increase sequentially and can only be placed in one region, making it impossible to scatter them.

| username: terry0219 | Original post link

For a table that already has existing data, can AUTO_INCREMENT be directly changed to AUTO_RANDOM?

| username: tidb菜鸟一只 | Original post link

Sure.

| username: terry0219 | Original post link

Got it, thanks~

| username: zhanggame1 | Original post link

To be consistent with MySQL’s auto-increment behavior, you need to add the AUTO_ID_CACHE=1 parameter when creating the table. As for hotspots, we can address them when they occur. In my tests on an NVMe drive, the impact is not significant.