Reasons for Changes in TiKV Store Size Metrics

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiKV store size 指标变化的原因

| username: Daniel-W

[Test Environment for TiDB] Testing
[TiDB Version] 6.5.3 5.1.4
TiKV-Details in 5.1.4 —> store size uses the metric sum(tikv_engine_size_bytes)
TiKV-Details in 6.5.3 —> store size uses the metric sum(tikv_store_size_bytes{type=“used”})

In 5.1.4, tikv_store_size_bytes does not have type=“used”

There is a significant difference between the values of sum(tikv_engine_size_bytes) and sum(tikv_store_size_bytes{type=“used”}) in 6.5.3. What is causing this difference?

[Attachment: Screenshot/Log/Monitoring]



| username: 大飞哥online | Original post link

The store size on the TiKV monitoring panel is the total size of the data files (SST files of RocksDB) for a single TiKV instance.

| username: Daniel-W | Original post link

What are the differences in monitoring metrics between versions 6.5 and 5.1?

| username: 大飞哥online | Original post link

I checked the capacity on the server at /tidb-data/tikv-20162/db, and it is correct.

| username: 大飞哥online | Original post link

I’m not sure about this. You can check the historical version records to see if there is any documentation.

| username: Daniel-W | Original post link

I didn’t see it written.

| username: Billmay表妹 | Original post link

Version updates of TiKV can lead to changes in metrics, so the metrics used for store size may differ across different versions of TiKV. In TiKV version 5.1.4, the metric used for store size is sum(tikv_engine_size_bytes), whereas in TiKV version 6.5.3, the metric used is sum(tikv_store_size_bytes{type="used"}).

The significant difference in values between these two metrics may be due to the different data they account for. tikv_engine_size_bytes measures the data size at the TiKV engine layer, including the size of SST files in RocksDB, WAL files, etc., while tikv_store_size_bytes measures the data size at the TiKV storage layer, including Region data size, Raft Log size, etc. Therefore, it is normal for these two metrics to have significantly different values.

If you need more detailed metric information, you can use the monitoring panel in TiDB Dashboard. In the monitoring panel, you can view detailed metric information for each component and customize the configuration as needed. For specific operational methods, you can refer to the relevant sections in the official TiDB documentation [1].

| username: 有猫万事足 | Original post link

It may be related to the aforementioned feature.

From the difference in metrics, the metric tikv_engine_size_bytes is collected from RocksDB.

https://github.com/search?q=repo%3Atikv%2Ftikv+tikv_engine_size_bytes&type=code

The metric tikv_store_size_bytes, on the other hand, is collected by TiKV itself.

https://github.com/search?q=repo%3Atikv%2Ftikv+tikv_store_size_bytes&type=code

So, it is likely that a new metric is needed when relying on RocksDB statistics alone is insufficient to cover all cases.

The partitioned-raft-kv, which has been continuously introduced since version 6.6, happens to use multiple RocksDB instances.
It seems like this metric change is introduced by this improvement.

| username: Daniel-W | Original post link

Thank you for the explanation.

| username: Daniel-W | Original post link

Thanks a lot, boss!

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.