Does enabling clustered_index in TiDB consume a lot of storage space?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB 开启clustered_index会占用很大的存储空间么

| username: residentevil

[Encountered Problem: Problem Phenomenon and Impact]: Does enabling clustered_index in TiDB consume a lot of storage space? Is this index stored in the writeCF of TiKV?

| username: 有猫万事足 | Original post link

Clustered index refers to the scenario where the key of the key-value pair is the value of the table’s primary key ID when data is stored.
Nonclustered index refers to the scenario where the key of the key-value pair is an automatically generated unique _tidb_rowid when data is stored. The primary key index is established separately.

In other words, clustered index is more compact than nonclustered index, and a primary key query will require one less table lookup.

Document location.

| username: residentevil | Original post link

Question 1: Will the data bloat after enabling clustered_index by default?
Question 2: From the monitoring, writeCF occupies 1/3 of the space of defaultCF. Why does it take up so much storage?

| username: 有猫万事足 | Original post link

Question 1, the answer is no, refer to the previous reply.

Question 2: The role of column families in each engine,

TiKV uses the `Column Families` (CF) feature of RocksDB.

* The default RocksDB instance stores KV data in three internal CFs: `default`, `write`, and `lock`.
  * The `default` CF stores the actual data, and its corresponding parameters are located in the `[rocksdb.defaultcf]` section;
  * The `write` CF stores the version information (MVCC) of the data and index-related data, with relevant parameters located in the `[rocksdb.writecf]` section;
  * The `lock` CF stores lock information, and the system uses default parameters.
* The Raft RocksDB instance stores Raft logs.
  * The `default` CF mainly stores Raft logs, and its corresponding parameters are located in the `[raftdb.defaultcf]` section.

So, the writeCF contains index data + MVCC. There are several possibilities for this CF to grow: 1. Frequent updates, resulting in more MVCC data. 2. Long GC time settings, retaining more MVCC data. 3. Many indexes created, resulting in more index data.

Combining the two questions, if you are asking whether clustered_index data occupies writeCF, the answer is yes. Although clustered_index can reduce the creation of an index, the MVCC information will still exist in writeCF. However, clustered_index definitely won’t cause data bloat; it should at least occupy less space than nonclustered_index. If it does grow larger, the cause should be investigated from the aspects of update frequency and GC length settings.

| username: residentevil | Original post link

We previously used the RocksDB engine, which by default only has defaultCF, so both data and indexes should be stored in defaultCF. Under the same table structure and data volume, after migrating to TiDB, we found that the data expanded by 40%. We want to identify the reason for this. Can you provide some insights? Generally, we found that this is because writeCF increased by 40%. :grin:

| username: 有猫万事足 | Original post link

No wonder, so you used to use RocksDB for pure KV, and now you are directly calling TiKV using the TiKV client, right?
It’s most likely a GC issue. This is a common problem when using TiKV alone.

When deploying TiKV alone, GC needs to be manually called from the TiKV client, or you can deploy a TiDB separately for GC.

Here is the GC documentation for the Go client. You can refer to it.

| username: residentevil | Original post link

Is there a view or tool that can show the capacity ratio of MVCC and INDEX in writeCF? This would help pinpoint whether the space amplification is caused by GC or if it’s due to large INDEX space usage. Currently, we are performing a full data migration, so I personally feel that it’s not an issue caused by GC.

| username: h5n1 | Original post link

TiKV has its own key format. Although it also uses RocksDB at the underlying level, you can’t compare them directly. Additionally, rows that do not exceed a certain length (should be 255 bytes) will be placed in write CF. Moreover, MVCC information is also written to write CF, which is something RocksDB does not have.

| username: 有猫万事足 | Original post link

I’m not quite sure how you are using TiKV.

If you previously used RocksDB for KV operations and now use TiKV for KV operations without TiDB, it’s most likely a GC issue.
If you previously used RocksDB for KV operations and now use SQL to operate TiKV, then if the writeCF is large, I don’t think there’s any problem.

| username: residentevil | Original post link

We directly deployed the TIDB cluster and then imported the MYSQL ROCKSDB data. We found that the data expanded. By the way, is there a specific link for GC-related optimization configuration? Preferably the best practices type. Could you please help check?

| username: 有猫万事足 | Original post link

That is not a scenario where the tikv-client directly calls TiKV.

You can check the progress of the GC safepoint in the TiKV Auto GC SafePoint graph under the GC section in tikv-details on Grafana. As long as it advances according to the GC interval over time, it should be fine.

I currently feel that it might be due to the increase in data from pure kv storage in rocksdb to transactional storage.

As for the monitoring of the writeCF ratio you mentioned above, I have no clue either. Let’s see if any other experts know.

For GC-related content, you can check this summary post: 【SOP 系列 25】GC 常见问题排查 - TiDB 的问答社区

| username: residentevil | Original post link

Thank you very much for your response. The TiDB community is indeed impressive. I will continue to consult you if I have further questions. Regarding the issue of storage expansion, I will conduct the following tests:

Step 1: Remove the INDEX from the table and insert data to observe the changes in writeCF capacity.
Step 2: Optimize the GC configuration.

| username: 有猫万事足 | Original post link

The TiDB community is indeed very awesome. I totally agree. :joy:
Assuming you need to migrate from RocksDB to TiKV with similar storage/performance, personally, I feel that directly using the rawkv API of the tikv-client might be the lowest cost migration and transformation.
Going through the SQL layer of TiDB might result in significant changes in both performance and storage compared to the original.
Anyway, trying to find a suitable solution is indeed a painstaking task. Good luck. :fist:

| username: zhanggame1 | Original post link

How do you investigate data bloat?

| username: residentevil | Original post link

:+1::+1::+1:

| username: residentevil | Original post link

After migrating from native MySQL with RocksDB engine storing 1TB of data to TiDB, I found that the storage capacity has increased to 1.4TB :sweat_smile:

| username: redgame | Original post link

Different data, different discrepancies.

| username: zhanggame1 | Original post link

The current version of TiDB defaults to clustered tables, where data is organized in an ordered format based on the primary key. If there is only a primary key index, there should be no additional physical indexes.

| username: zhanggame1 | Original post link

TiDB has 3 replicas. Are you deploying one TiKV with 1.4TB for each of the 3 TiKVs?

| username: residentevil | Original post link

If TiKV has a single replica, it is 1.4T. If it has three replicas, it would be 1.4T*3. Recently, I will verify by removing the table index and then importing the data to compare.