Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TIDB 聚簇索引表比非聚簇索引表是否会占用更多的存储空间
[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.0
[Encountered Problem: Problem Description and Impact]: Is there any test data for the second disadvantage mentioned in the official documentation?
The difference between clustered index tables and non-clustered index tables in TiDB is essentially whether the key is composed of the table ID and primary key or _tidb_rowid. When using data types larger than 64 bits as the primary key and having multiple secondary indexes, it may occupy more space. This is because it means the key of the table will be larger than 64 bits, and each secondary index needs to store the key for each column. The more secondary indexes there are, the longer the key, which means the key is stored more times, thus occupying more space.
For example: The primary key is a composite primary key consisting of two fields (a, b), and both fields are 64-bit bigints. According to your understanding, such a primary key actually exceeds 64 bits [64+64=128], so the storage capacity increases, and this information will be saved in the writeCF, right?
Clustered indexes do not take up additional space. In my tests, using clustered indexes is twice as fast as not using them when I/O performance is weak because it eliminates one index I/O operation.
Look at the information in the above reply and the official documentation.
Yes, a normal clustered index table is faster than a non-clustered index table because its key corresponds directly to the primary key, allowing the key to be determined directly through the primary key. However, if the primary key is very long, it will result in a long key, and in cases where there are many secondary indexes, it may indeed take up more space. But I think as long as it’s not excessively long, the extra space usage is negligible. After all, compared to storage overhead, the improvement in system performance is more important. Therefore, I still recommend using clustered index tables more often.
I will test the cost difference between these two types of tables in my own business scenario.
You can try it and see that the performance difference is significant when there is only a primary key index.
OK, I need to test it this way, thanks for the reminder