How to Understand TiDB Space Usage Statistics

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb空间大小统计如何理解

| username: 人如其名

[TiDB Usage Environment] Poc
[TiDB Version] 6.3
[Encountered Problem] How to understand space size statistics

Question:
Are the CAPACITY, AVAILABLE, and REGION_SIZE in the tikv_store_status table real-time actual sizes, and are they before or after compression?
The APPROXIMATE_SIZE in the tikv_region_status table is an estimated size. How is this estimate derived?
Do TABLE_ROWS and DATA_LENGTH in the tables table only come from statistics? If the table is large but the statistics are small, will the values always be relatively small?
Are the storage_capacity and storage_size in the METRICS_SCHEMA.pd_cluster_status monitoring view actual sizes? How do they relate to CAPACITY and AVAILABLE in the tikv_store_status table? Which is more accurate, and are they before or after compression?

mysql> select * from tikv_store_status where store_id=1 \G
*************************** 1. row ***************************
         STORE_ID: 1
          ADDRESS: 127.0.0.1:20160
      STORE_STATE: 0
 STORE_STATE_NAME: Up
            LABEL: null
          VERSION: 6.3.0
         CAPACITY: 233.5GiB
        AVAILABLE: 87.66GiB
     LEADER_COUNT: 2
    LEADER_WEIGHT: 1
     LEADER_SCORE: 2
      LEADER_SIZE: 2
     REGION_COUNT: 2
    REGION_WEIGHT: 1
     REGION_SCORE: 6.726269909837314
      REGION_SIZE: 2
         START_TS: 2022-10-10 13:07:09
LAST_HEARTBEAT_TS: 2022-10-10 15:09:52
           UPTIME: 2h2m43.071582s
1 row in set (0.00 sec)

mysql> select * from tikv_region_status limit 1 \G
*************************** 1. row ***************************
                REGION_ID: 71
                START_KEY: 
                  END_KEY: 7480000000000000FF4600000000000000F8
                 TABLE_ID: 4
                  DB_NAME: mysql
               TABLE_NAME: user
                 IS_INDEX: 1
                 INDEX_ID: 1
               INDEX_NAME: PRIMARY
           EPOCH_CONF_VER: 1
            EPOCH_VERSION: 68
            WRITTEN_BYTES: 635
               READ_BYTES: 174287
         APPROXIMATE_SIZE: 1
         APPROXIMATE_KEYS: 1730
  REPLICATIONSTATUS_STATE: NULL
REPLICATIONSTATUS_STATEID: NULL
1 row in set (0.01 sec)

mysql> select * from tables limit 1 \G
*************************** 1. row ***************************
             TABLE_CATALOG: def
              TABLE_SCHEMA: INFORMATION_SCHEMA
                TABLE_NAME: REFERENTIAL_CONSTRAINTS
                TABLE_TYPE: SYSTEM VIEW
                    ENGINE: InnoDB
                   VERSION: 10
                ROW_FORMAT: Compact
                TABLE_ROWS: 0
            AVG_ROW_LENGTH: 0
               DATA_LENGTH: 0
           MAX_DATA_LENGTH: 0
              INDEX_LENGTH: 0
                 DATA_FREE: 0
            AUTO_INCREMENT: NULL
               CREATE_TIME: 1970-01-01 08:00:00
               UPDATE_TIME: NULL
                CHECK_TIME: NULL
           TABLE_COLLATION: utf8mb4_bin
                  CHECKSUM: NULL
            CREATE_OPTIONS: 
             TABLE_COMMENT: 
             TIDB_TABLE_ID: 4611686018427387918
 TIDB_ROW_ID_SHARDING_INFO: NULL
              TIDB_PK_TYPE: NONCLUSTERED
TIDB_PLACEMENT_POLICY_NAME: NULL
1 row in set (0.01 sec)

mysql> select * from METRICS_SCHEMA.pd_cluster_status where time=now() and type like 'storage_%';
+----------------------------+----------------+------------------+--------------+
| time                       | instance       | type             | value        |
+----------------------------+----------------+------------------+--------------+
| 2022-10-10 15:12:28.000000 | 127.0.0.1:2379 | storage_capacity | 501371150336 |
| 2022-10-10 15:12:28.000000 | 127.0.0.1:2379 | storage_size     |    303885665 |
+----------------------------+----------------+------------------+--------------+
2 rows in set (0.00 sec)
| username: h5n1 | Original post link

REGION_SIZE is an estimated value based on the number of keys and the length of the keys, and the number of keys is not an exact value. The data in METRICS_SCHEMA comes from Prometheus monitoring data. The CAPACITY and AVAILABLE in tikv_store_status should also be the actual sizes, but there might be unit conversion differences with METRICS_SCHEMA. You can convert it to bytes using 1024 and then to TB using 1000 for comparison. The TABLE_ROWS and DATA_LENGTH in the tables are estimated based on statistical information.

| username: TiDBer_CEVsub | Original post link

You can use AVG_ROW_LENGTH and TABLE_ROWS from the tables table for calculations.