Thoughts on TiDB View Statistics Size, Display Size in Monitoring, and Actual Disk Usage?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 视图统计大小、监控中显示大小、实际磁盘占用空间思考?

| username: 麻烦是朋友

TiDB View Statistics Size, Monitoring Display Size, Actual Disk Usage Space Consideration? Found that the actual usage space matches the view and monitoring space. How to accurately calculate table size and database size?
Disk 34 GB Monitoring 49 View 142. The ratio of view to monitoring is about 3:1

Cluster Information:
Starting component cluster: /home/tidb/.tiup/components/cluster/v1.12.3/tiup-cluster display lbtest
Cluster type: tidb
Cluster name: lbtest
Cluster version: v6.5.3
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://xxxxxxxxxx:2479/dashboard
Grafana URL: http://xxxxxxxxxx:3000
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


xxxxxxxxxx:3000 grafana xxxxxxxxxx 3000 linux/x86_64 Up - /data/tidb-deploy/grafana-3000
xxxxxxxxxx:2379 pd xxxxxxxxxx 2379/2380 linux/x86_64 Up|L /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
xxxxxxxxxx:2479 pd xxxxxxxxxx 2479/2480 linux/x86_64 Up|UI /data/tidb-data/pd-2479 /data/tidb-deploy/pd-2479
xxxxxxxxxx:9090 prometheus xxxxxxxxxx 9090/12020 linux/x86_64 Up /data/tidb-data/prometheus-9090 /data/tidb-deploy/prometheus-9090
xxxxxxxxxx:4000 tidb xxxxxxxxxx 4000/10080 linux/x86_64 Up - /data/tidb-deploy/tidb-4000
xxxxxxxxxx:5000 tidb xxxxxxxxxx 5000/10081 linux/x86_64 Up - /data/tidb-deploy/tidb-5000
xxxxxxxxxx:20160 tikv xxxxxxxxxx 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
xxxxxxxxxx:20161 tikv xxxxxxxxxx 20161/20181 linux/x86_64 Up /data/tidb-data/tikv-20161 /data/tidb-deploy/tikv-20161
xxxxxxxxxx:20162 tikv xxxxxxxxxx 20162/20182 linux/x86_64 Up /data/tidb-data/tikv-20162 /data/tidb-deploy/tikv-20162

View Query Database Total Size: 142GB

Prometheus Monitoring: 49.2GB

Disk Storage Information:
Statistics of db disk space usage: 34GB
[tidb@xxxxxxxxxxxx tidb-data]$ du -sh *
4.0K cdc-data
4.0K monitor-9100
414M pd-2379
485M pd-2479
36G prometheus-9090
101G tikv-20160
98G tikv-20161
97G tikv-20162
[tidb@xxxxxxxxxxxx tidb-data]$
[tidb@xxxxxxxxxxxx tidb-data]$ cd tikv-20160
**[tidb@xxxxxxxxxxxx tikv-20160]$ du -sh ***
11G db
12K import
20K last_tikv.toml
0 LOCK
0 raftdb.info
6.9G raft-engine
301M rocksdb-2023-07-28T16-12-18.302.info
301M rocksdb-2023-09-08T10-48-42.537.info
301M rocksdb-2023-10-13T00-09-34.548.info
301M rocksdb-2023-11-21T02-57-02.361.info
301M rocksdb-2023-12-27T18-13-02.481.info
146M rocksdb.info
4.0K snap
82G space_placeholder_file
[tidb@xxxxxxxxxxxx tikv-20160]$ cd …/tikv-20161
**[tidb@xxxxxxxxxxxx tikv-20161]$ du -sh ***
11G db
12K import
20K last_tikv.toml
0 LOCK
0 raftdb.info
4.4G raft-engine
301M rocksdb-2023-07-27T20-22-10.824.info
301M rocksdb-2023-09-04T21-20-42.265.info
301M rocksdb-2023-10-10T13-52-08.023.info
301M rocksdb-2023-11-19T15-42-26.124.info
301M rocksdb-2023-12-28T07-15-21.382.info
141M rocksdb.info
4.0K snap
82G space_placeholder_file
[tidb@xxxxxxxxxxxx tikv-20161]$ cd …/tikv-20162
**[tidb@xxxxxxxxxxxx tikv-20162]$ du -sh ***
12G db
12K import
20K last_tikv.toml
0 LOCK
0 raftdb.info
2.5G raft-engine
301M rocksdb-2023-07-27T20-09-17.561.info
301M rocksdb-2023-09-06T05-07-10.879.info
301M rocksdb-2023-10-15T01-26-02.859.info
301M rocksdb-2023-11-22T23-59-09.160.info
301M rocksdb-2023-12-29T07-30-34.125.info
139M rocksdb.info
4.0K snap
82G space_placeholder_file

| username: zhanggame1 | Original post link

Recently, I noticed that the standby database is taking up twice as much disk space as the primary database for the same data. I really can’t figure out why.

| username: 麻烦是朋友 | Original post link

Let’s see how the official response.

| username: zhanggame1 | Original post link

They said they have encountered this issue and are working on improvements.

| username: wluckdog | Original post link

I use the MySQL method to query the table size.

| username: 麻烦是朋友 | Original post link

Waiting for my cousin from Guangzhou to appear.

| username: dba远航 | Original post link

I have also never understood this issue.

| username: zhanggame1 | Original post link

This has nothing to do with physical space usage.

| username: hey-hoho | Original post link

Currently, there is no particularly accurate way to calculate the actual data size of a single table or single database. The results obtained from system views are all estimated based on statistical information.

For space calculation in monitoring, you can refer to this article:

| username: tidb菜鸟一只 | Original post link

Run this query in TiDB:

SELECT
    db_name,
    table_name,
    ROUND(SUM(total_size / cnt), 2) AS Approximate_Size,
    ROUND(SUM(total_size / cnt / (SELECT
        ROUND(AVG(VALUE), 2)
        FROM
        METRICS_SCHEMA.store_size_amplification
        WHERE
        VALUE > 0)),
    2) AS Disk_Size
FROM
    (SELECT
        db_name,
        table_name,
        region_id,
        SUM(Approximate_Size) AS total_size,
        COUNT(*) AS cnt
    FROM
        information_schema.TIKV_REGION_STATUS
    WHERE
        db_name = 'sbtest'
        AND table_name IN ('sbtest1')
    GROUP BY db_name, table_name, region_id) AS tabinfo
GROUP BY db_name, table_name;
| username: zhanggame1 | Original post link

Regarding the issue of inconsistent disk usage between the primary and standby databases that I mentioned last time, I used your SQL to check the total size of the entire database.

Primary database:
Approximate_Size: 249523.00
Disk_Size: 114987.56

Standby database:
Approximate_Size: 374578.00
Disk_Size: 309568.6

When checking a view similar to MySQL’s tables view, it only shows 26929.02MB. I have no idea why the space usage is so outrageous, differing by an order of magnitude.

| username: tidb菜鸟一只 | Original post link

How about directly checking the directory size on the host in MySQL? Tables are based on statistics, but the statistics may not be accurate.

| username: wluckdog | Original post link

In a distributed environment, checking a specific table from the host is still not very convenient. I am also looking for an accurate way to estimate the table size.

| username: zhanggame1 | Original post link

If the tables have been analyzed, they are still basically accurate. The directory is full of SST files, which is why it’s so frustrating that they take up so much space.

| username: 麻烦是朋友 | Original post link

Yes, this issue may lead to deviations in space resource assessment, which could affect the accuracy of procurement quantities for the following year.

| username: zhanggame1 | Original post link

Yes, what I’m concerned about is how large the capacity should be when purchasing server hard drives.

| username: Jayjlchen | Original post link

The calculation of table size can be found on the official website here.
TiDB Cluster Management FAQ | PingCAP Documentation Center