The storage space usage of raft-engine exceeds expectations

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: raft-engine的存储空间占用大小超过预期

| username: wengsy

【TiDB Usage Environment】Test Environment
【TiDB Version】

select version();
+--------------------+
| version()          |
+--------------------+
| 8.0.11-TiDB-v7.4.0 |
+--------------------+

【Reproduction Path】
Repeatedly creating and deleting tables in a database.
【Encountered Problem: Phenomenon and Impact】
The space size of the raft-engine folder under the TiKV storage path exceeds expectations.

> du -h --max-depth=1
36G     ./raft-engine
31G     ./tablets
4.0K    ./import
4.0K    ./tablet_snap
66G     .

Checking the files in the raft-engine folder, a large number of files were found. The result sorted by time is shown in the figure below, with about 130 files.



The relevant configuration of the TiDB cluster is as follows:

MySQL [(none)]> show config where name like '%raftstore.raft-log-gc%';
+---------+-------------------+-----------------------------------------------------+--------+
| Type    | Instance          | Name                                                | Value  |
+---------+-------------------+-----------------------------------------------------+--------+
| tikv    | 49.52.27.20:20162 | raftstore.raft-log-gc-count-limit                   | 10000  |
| tikv    | 49.52.27.20:20162 | raftstore.raft-log-gc-size-limit                    | 200MiB |
| tikv    | 49.52.27.20:20162 | raftstore.raft-log-gc-threshold                     | 50     |
| tikv    | 49.52.27.20:20162 | raftstore.raft-log-gc-tick-interval                 | 3s     |
| tikv    | 49.52.27.20:20161 | raftstore.raft-log-gc-count-limit                   | 10000  |
| tikv    | 49.52.27.20:20161 | raftstore.raft-log-gc-size-limit                    | 200MiB |
| tikv    | 49.52.27.20:20161 | raftstore.raft-log-gc-threshold                     | 50     |
| tikv    | 49.52.27.20:20161 | raftstore.raft-log-gc-tick-interval                 | 3s     |
| tikv    | 49.52.27.20:20160 | raftstore.raft-log-gc-count-limit                   | 10000  |
| tikv    | 49.52.27.20:20160 | raftstore.raft-log-gc-size-limit                    | 200MiB |
| tikv    | 49.52.27.20:20160 | raftstore.raft-log-gc-threshold                     | 50     |
| tikv    | 49.52.27.20:20160 | raftstore.raft-log-gc-tick-interval                 | 3s     |
| tiflash | 49.52.27.20:3930  | raftstore-proxy.raftstore.raft-log-gc-count-limit   | 73728  |
| tiflash | 49.52.27.20:3930  | raftstore-proxy.raftstore.raft-log-gc-size-limit    | 72MiB  |
| tiflash | 49.52.27.20:3930  | raftstore-proxy.raftstore.raft-log-gc-threshold     | 50     |
| tiflash | 49.52.27.20:3930  | raftstore-proxy.raftstore.raft-log-gc-tick-interval | 3s     |
+---------+-------------------+-----------------------------------------------------+--------+

Among them, raftstore.raft-log-gc-size-limit is 200MiB, which is much smaller than the actual 36G. raftstore.raft-log-gc-tick-interval is 3s, but the time difference of the raftlog in the directory reaches the minute level, so the extra raftlog should be able to be deleted.

Currently, the logs have been temporarily cleared through tiup clean --all, but I would like to know how the space of the raft-engine directory limited by the TiDB configuration items is calculated and how to limit the space usage of the raft-engine directory. Thank you!

Additionally, the configuration file of the cluster tikv is as follows:

  tikv:
    log.file.max-backups: 10
    log.level: error
    readpool.coprocessor.use-unified-pool: true
    readpool.storage.use-unified-pool: false
    rocksdb.max-total-wal-size: 1
    storage.engine: partitioned-raft-kv
| username: xfworld | Original post link

Looking at these parameters, they should be helpful to you:

| username: wengsy | Original post link

Thank you for your reply!

We have noted the relevant parameters and displayed the parameters of our cluster in the main post. Below are the corresponding parameters for a single node, with all parameters available in the main post.

±--------±------------------±----------------------------------------------------±-------+
| Type | Instance | Name | Value |
±--------±------------------±----------------------------------------------------±-------+
| tikv | 49.52.27.20:20162 | raftstore.raft-log-gc-count-limit | 10000 |
| tikv | 49.52.27.20:20162 | raftstore.raft-log-gc-size-limit | 200MiB |
| tikv | 49.52.27.20:20162 | raftstore.raft-log-gc-threshold | 50 |
| tikv | 49.52.27.20:20162 | raftstore.raft-log-gc-tick-interval | 3s |

Due to the significant discrepancy between the values of the relevant parameters and the actual sizes we observed, we raised this question.

In the documentation, raft-log-gc-size-limit is described as a hard limit on the size of residual Raft logs, with a specific value of 200MiB in our cluster, but the total space usage in the raft-engine is 36G, which is a huge difference. Combining the description of raft-log-gc-count-limit as “3/4 of the number of logs that a Region can hold” and the default value of raft-log-gc-size-limit as “3/4 of the Region size”. raft-log-gc-size-limit seems to refer to the total size of all files, not the size of a single log file (if it were, the observed phenomenon would be normal, as the size of a single file is 130MB, and the number of files is less than 10000).

Therefore, we would like to know whether raft-log-gc-size-limit specifically refers to the overall space size or the size of a single file. And how is the space usage of the raft-engine directory calculated, is it raft-log-gc-size-limit, or raft-log-gc-size-limit * raft-log-gc-count-limit, or neither?

| username: wangccsy | Original post link

The parameter configuration is too complicated.

| username: dba远航 | Original post link

2nd floor is fine.

| username: Kongdom | Original post link

:thinking: I suspect this is limiting the size of a single node. According to the screenshot from the original poster, it seems that three nodes are deployed on a single machine. So when calculating the occupied space, should we multiply by 3 to get the upper limit? Does this match the observed results?

| username: wengsy | Original post link

Yes, our cluster has three TiKV instances deployed on the same physical node.

However, the specific directories for the three TiKV instances are different. The raft-engine directory shown here is just the raft-engine for one of the TiKV instances. The overall situation is that each directory occupies 36G, and the three nodes together have more than 100G, while the parameter shows 200MiB.

I agree that these parameters describe the situation of a single TiKV instance, but it still seems inconsistent.

| username: 江湖故人 | Original post link

Try setting the raft-log-gc-count-limit to half of the current file count and see if it has any effect.

| username: 江湖故人 | Original post link

Any one of these three parameters reaching the threshold should trigger the cleanup, right?

| username: zhanggame1 | Original post link

I remember encountering this situation during testing. Restarting the TiDB cluster can clear it up.

| username: wengsy | Original post link

We have now changed the raft-log-gc-count-limit to 1/10 of the original value. Currently, the space usage is very small, and it is working for now, but we cannot determine the exact threshold that triggers the cleanup. Therefore, this is just a temporary solution that addresses the symptoms but not the root cause, and we don’t know when it might cause problems again.

| username: wengsy | Original post link

Thank you for your solution!

Since we are currently setting up a test cluster, we simply and crudely used tiup to clear all the data in the entire cluster. Subsequently, we adjusted the configuration parameters, and so far, there have been no issues during testing. However, when it comes to official use, we definitely do not want these uncertainties and manual processes. Therefore, we hope to understand how TiDB handles this issue and how these parameters specifically function.

| username: zhanggame1 | Original post link

After testing TiDB, I found that TiKV occupies a lot of disk space. You really need to allocate more disk space.

| username: xfworld | Original post link

The raft-log is used for snapshot catching up, version verification, playback, etc. The system will retain it for a certain period to reduce the usage of other resources.

If you are unsure about the parameters in this area, it is recommended to handle it in two ways:

  • Reserve a reliable space ratio for retention and automatic cleanup.
  • Test the adjusted value of the parameter based on the actual business scenario to see if it meets the requirements.

Please refer to


Additionally, it is recommended to use the LTS version…

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.