Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: pd监控面板–>Size amplification
In the PD monitoring panel, there is a monitoring panel called “Size amplification.” According to the official documentation and the principles of RocksDB, this monitoring panel should reflect the ratio of the actual storage space used to the true size of the data (e.g., the true size of the data is 100MB, but it takes up 200MB of storage space).
- Because RocksDB supports MVCC, a single piece of data can have multiple versions, and expired data is not immediately cleaned up. These multiple versions of data also occupy space.
- During the compaction process in RocksDB, for example, when two pieces of data from the upper layer are compressed into the next layer, the original two pieces of data can only be deleted after the compaction is completed, which also causes space amplification. This space amplification can be considered as 2.
Therefore, can we judge from the metrics that if the space amplification ratio is very high, it may indicate that old version data has not been cleaned up in time, which could mean that the GC time is set too long or the GC is not effective? Is this conclusion correct?
However, the question is why the PromSQL formula is written this way? Why does it need to be multiplied by 2^20?