Does a clustered index table in TiDB consume more storage space compared to a non-clustered index table?

translator_bot · June 21, 2024, 10:16pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB 聚簇索引表比非聚簇索引表是否会占用更多的存储空间

| username: residentevil

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.0
[Encountered Problem: Problem Description and Impact]: Is there any test data for the second disadvantage mentioned in the official documentation?

translator_bot · June 21, 2024, 10:16pm

| username: tidb菜鸟一只 | Original post link

The difference between clustered index tables and non-clustered index tables in TiDB is essentially whether the key is composed of the table ID and primary key or _tidb_rowid. When using data types larger than 64 bits as the primary key and having multiple secondary indexes, it may occupy more space. This is because it means the key of the table will be larger than 64 bits, and each secondary index needs to store the key for each column. The more secondary indexes there are, the longer the key, which means the key is stored more times, thus occupying more space.

translator_bot · June 21, 2024, 10:16pm

| username: residentevil | Original post link

For example: The primary key is a composite primary key consisting of two fields (a, b), and both fields are 64-bit bigints. According to your understanding, such a primary key actually exceeds 64 bits [64+64=128], so the storage capacity increases, and this information will be saved in the writeCF, right?

translator_bot · June 21, 2024, 10:16pm

| username: zhanggame1 | Original post link

Clustered indexes do not take up additional space. In my tests, using clustered indexes is twice as fast as not using them when I/O performance is weak because it eliminates one index I/O operation.

translator_bot · June 21, 2024, 10:16pm

| username: residentevil | Original post link

Look at the information in the above reply and the official documentation.

translator_bot · June 21, 2024, 10:16pm

| username: redgame | Original post link

Yes, it will.

translator_bot · June 21, 2024, 10:16pm

| username: tidb菜鸟一只 | Original post link

Yes, a normal clustered index table is faster than a non-clustered index table because its key corresponds directly to the primary key, allowing the key to be determined directly through the primary key. However, if the primary key is very long, it will result in a long key, and in cases where there are many secondary indexes, it may indeed take up more space. But I think as long as it’s not excessively long, the extra space usage is negligible. After all, compared to storage overhead, the improvement in system performance is more important. Therefore, I still recommend using clustered index tables more often.

translator_bot · June 21, 2024, 10:16pm

| username: residentevil | Original post link

I will test the cost difference between these two types of tables in my own business scenario.

translator_bot · June 21, 2024, 10:16pm

| username: zhanggame1 | Original post link

You can try it and see that the performance difference is significant when there is only a primary key index.

translator_bot · June 21, 2024, 10:16pm

| username: residentevil | Original post link

OK, I need to test it this way, thanks for the reminder