Why is the compression ratio 0.01 in PD monitoring, shouldn't it be greater than 1?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd监控里压缩比为什么是0.01,不是应该大于1?

| username: Soysauce520

[Test Environment for TiDB] Testing
[TiDB Version] v.6.5
[Observed Phenomenon]
image

This is my test environment with very little data. Why is the storage compression ratio 0.01? Has it expanded? Does anyone have any insights?

| username: zhanggame1 | Original post link

123G is the available hard disk size, and the right side is the actual usage.

| username: tidb菜鸟一只 | Original post link

Storage capacity is your total available capacity, current is how much you are currently using.

| username: Soysauce520 | Original post link

Size amplification mainly refers to this diagram :rofl:

| username: Soysauce520 | Original post link

Uh. What I want to ask about is the picture below; the picture above just indicates that the test environment has a very small amount of data.

| username: 源de爸 | Original post link

Open it and take a look, how is this metric configured?

| username: TiDBer_小阿飞 | Original post link

  • Size amplification: The space amplification ratio of each TiKV instance.
    You can check the Size amplification metric under the PD - statistics balance panel in Grafana to get the average compression ratio of the cluster. The average compression ratio of the cluster is the average Size amplification of all nodes.
    Specifically, you need to look at the storage logic of RocksDB, such as the data stored at levels L0, L1, L2, L3, and L4.
| username: 连连看db | Original post link

This value is variable, and as the data gradually increases, this value will gradually exceed 1.

| username: DBAER | Original post link

There is still no multi-layer for large amounts of data, right?

| username: TIDB-Learner | Original post link

How does this compression ratio compare to others? How was your data generated? Normally, it should be 1+.

| username: Soysauce520 | Original post link

sum(pd_scheduler_store_status { type="region_size"}) by (address, store) 
/ sum(pd_scheduler_store_status {type="store_used"}) by (address, store) * 2^20
| username: Soysauce520 | Original post link

You can test it by writing some data randomly, and this is the result. I also think it’s 1+.

| username: Soysauce520 | Original post link

Does it have anything to do with multi-layer?

| username: Soysauce520 | Original post link

Is there a detailed explanation?

| username: Soysauce520 | Original post link

Where can I find reference materials?

| username: Daniel-W | Original post link

It is related to the size of the data. The size of the data determines the level it enters in RocksDB.

| username: TiDBer_小阿飞 | Original post link

The link you provided leads to a Zhihu article. Please provide the text you need translated, and I will translate it for you.

| username: zhang_2023 | Original post link

He is changing.

| username: zhanggame1 | Original post link

To add,
The data stored at levels L0\L1\L2\L3\L4, by default, levels 0 and 1 should be uncompressed.

| username: zhaokede | Original post link

So detailed!