Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 关于TIKV_REGION_STATUS、TIKV_REGION_PEERS及监控中tidb-acloud-overview的当前数据量疑问
Question 1:
What is the relationship between the number of regions in tikv_region_peers and tikv_region_status?
The documentation says that tikv_region_status contains the basic information of TiKV regions. I understand this to mean the total number of regions in the cluster?
What does the number of regions in the tikv_region_peers table represent?
Question 2:
The current data volume in the monitoring is 1T.
When I calculate the region size in tikv_region_status and compare it with the current data volume in the monitoring, the difference is too large.
How can I determine the total number of regions in the cluster? How can I query this?
The roles and meanings of these two tables:
-
tikv_region_status
table: This table provides basic information about each Region in the TiKV cluster, including Region ID, start key, end key, table ID, database name, table name, etc. [1]. By querying this table, you can obtain detailed information about all Regions in the TiKV cluster.
-
tikv_region_peers
table: This table provides information about each Region’s Peers (replicas) in the TiKV cluster, including Peer ID, Store ID, whether it is a Learner or Leader, etc. [2]. By querying this table, you can understand the distribution of Peers for each Region.
The number of Regions in these two tables represents different meanings:
-
The number of Regions in the tikv_region_status
table represents the total number of Regions in the entire TiKV cluster. Each Region corresponds to a portion of the data in the TiKV cluster.
-
The number of Regions in the tikv_region_peers
table represents the number of Regions on each TiKV node. Each TiKV node is responsible for storing and processing a portion of the Region’s data and maintaining the Peer information of these Regions.
Therefore, the number of Regions in the tikv_region_status
table reflects the overall situation of the entire TiKV cluster, while the number of Regions in the tikv_region_peers
table reflects the local situation on each TiKV node.
[2]: TiDB Official Documentation - INFORMATION_SCHEMA.TIKV_REGION_PEERS: TIKV_REGION_PEERS | PingCAP 文档中心
[1]: TiDB Official Documentation - INFORMATION_SCHEMA.TIKV_REGION_STATUS: TIKV_REGION_STATUS | PingCAP 文档中心
- The number of Regions in the
tikv_region_status
table represents the total number of Regions in the entire TiKV cluster. Each Region corresponds to a portion of the data in the TiKV cluster.
- The number of Regions in the
tikv_region_peers
table represents the number of Regions on each TiKV node. Each TiKV node is responsible for storing and processing a portion of the Region’s data and maintaining the Peer information of these Regions.
Why aren’t they in a 3x relationship?
To clarify, this database is static data with no data modifications. It’s a test cluster that I am playing with.
The calculated result also differs greatly, not every region is 96M, I understand, but the result differs too much.
The size should be checked using the APPROXIMATE_SIZE field, not directly calculated as 96M.
TiKV data is compressed. APPROXIMATE_SIZE is the compressed size. 3 * APPROXIMATE_SIZE / compression ratio is the total disk usage of TiKV. Refer to this for the compression ratio:
There’s a big difference, your calculation formula is incorrect.
The first one is the number of replicas, and the second one is the number of regions. These are different concepts.
From the picture, it shows that you have three replicas. If you know how it is calculated, you will understand their differences.
This table lookup size is also inaccurate.
SELECT
db_name,
table_name,
ROUND(SUM(total_size / cnt), 2)/1024 Approximate_Size,
ROUND(SUM(total_size / cnt / (SELECT
ROUND(AVG(value), 2)
FROM
METRICS_SCHEMA.store_size_amplification
WHERE
value > 0)),
2)/1024 Disk_Size
FROM
(SELECT
db_name,
table_name,
region_id,
SUM(Approximate_Size) total_size,
COUNT(*) cnt
FROM
information_schema.TIKV_REGION_STATUS
WHERE
db_name = 'db_name'
-- AND table_name IN ('t_name')
GROUP BY db_name , table_name , region_id) tabinfo
GROUP BY db_name , table_name;
After this query, the total disk space size doesn’t match.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.