Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 关于表的大小
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 7.1
[Encountered Problem: Problem Phenomenon and Impact]
Recently, while testing hotspot table caching, I have a question. The official documentation states that the total size of table data should not exceed 64MB. I want to know the specific size of the table I want to cache. What is the exact SQL used to determine the 64MB size?
How to Estimate the Size of a Table in TiDB?
To estimate the size of a table in TiDB, you can refer to the following query:
SELECT
db_name,
table_name,
ROUND(SUM(total_size / cnt), 2) Approximate_Size,
ROUND(SUM(total_size / cnt / (SELECT
ROUND(AVG(value), 2)
FROM
METRICS_SCHEMA.store_size_amplification
WHERE
value > 0)),
2) Disk_Size
FROM
(SELECT
db_name,
table_name,
region_id,
SUM(Approximate_Size) total_size,
COUNT(*) cnt
FROM
information_schema.TIKV_REGION_STATUS
WHERE
db_name = @dbname
AND table_name IN (@table_name)
GROUP BY db_name , table_name , region_id) tabinfo
GROUP BY db_name , table_name;
When using the above query, you need to fill in and replace the following fields according to your actual situation:
@dbname
: Database name.
@table_name
: Target table name.
Additionally, in the above query:
store_size_amplification
represents the average compression ratio of the cluster. Besides using the SELECT * FROM METRICS_SCHEMA.store_size_amplification;
query to get this information, you can also check the Size amplification
metric under the PD - statistics balance panel in Grafana monitoring for each node. The average compression ratio of the cluster is the average of the Size amplification
values of all nodes.
Approximate_Size
represents the size of a single replica of the table before compression. This value is an estimate and not an exact value.
Disk_Size
represents the size of the table after compression, which can be estimated based on Approximate_Size
and store_size_amplification
.
The official documentation states that this is an estimated value and may not be the actual value. The actual value might require further methods for exploration.
Then which value in the query result should I look at for the 64m in my question?
Try executing ALTER TABLE t CACHE to see if it fails.
Are you referring to this value?
Yes, since there is a 64MB limit, you should know the difference between this table and 64MB before performing the operation.
64M = 1024 x 64KB = 1024 x 1024 x 64 bytes = 67,108,864 bytes. Excluding indexes and such, varchar(10000) can store 6710 records. Calculated manually, not sure if it’s correct.
I just tried it, and the example I gave exceeds 64M.
This means it was 216M before compression and 15.6M after compression. It indeed exceeds 216M.
I verified it, and it is judged based on this value
It can store approximately more than 2200 records of varchar(10000).
SELECT ROUND(DATA_LENGTH/1024/1024,2) FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema = 'test'
AND table_name = 'users2'
There is an issue with this statement: when the data volume decreases, this value does not change. I tried several times.
When the data volume increases, this value does change.
If reduced, does it mean there is no GC, including MVCC historical data?
Teacher, if we compare the estimated table size with this value and 64M, can we understand that the table size queried by this statement is accurate? Because I have seen before that the result of this table size query is inaccurate.
Previously, the backup of the TiDB database was done using MySQL. The size queried in TiDB is approximately 700GB, while in MySQL it is about 1.3TB. Both sizes were queried using the MySQL statement to check the database size: SELECT ROUND(DATA_LENGTH/1024/1024,2) FROM INFORMATION_SCHEMA.TABLES
— this kind of SQL.
TiDB data is stored in regions, with 3 replicas by default. The MVCC content is also within the regions. The data is compressed during storage, and the compression ratio is significant, so estimating the size might be a bit difficult.
Yes, so it’s unclear which data volume is accurate. There are already 3 replicas, so the data is redundant.
This statement corresponds to that parameter, not the actual physical file size. Previously, we often discussed the physical file size. The physical file size should correspond to the statement posted on the first floor.
Okay, thank you, understood.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.