Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiDB集群真实大小
How to check the actual size of TiDB cluster data
[TiDB Environment] Production
[TiDB Version] 5.0.6
[Problem Encountered] Doubts about the size shown in monitoring
[Reproduction Path] Operations performed that led to the issue
[Problem Phenomenon and Impact]
Monitoring shows 52T
But querying information_schema
The data difference is too large
[Attachment]
Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.
The information in information_schema.tables is inaccurate due to the system structure, and there are also 3 replicas.
How can I find the actual data size? It feels like there’s a big difference right now.
The monitoring data is the size of the cluster, with 3 replicas. Check if there are many log files in the data directory, the size of the placeholder files, the size of the snap directory, and the size of the raftdb directory.
It’s the size of the cluster, with 3 replicas. Check if there are many log files in the data directory, the size of the placeholder files, the size of the snap directory, and the size of the raftdb directory.
The db directory is 2.7T.
52.2TB, this is basically accurate. You can verify it by adding up the sizes of the data directories of all TiKV nodes.
However, the business side says that the actual size of each database is not that large. Is there a dictionary table that can check the actual size of each database?
Now we need to investigate which database table is occupying the most space; some unnecessary data may need to be cleaned up.
Now we need to investigate which database table is occupying the most space. There may be some unnecessary data that needs to be cleaned up.
Here’s a suggestion you can try: For a single table, use the data size in information_schema.tables
divided by the number of rows in information_schema.tables
, and then multiply by the real-time data volume of the table.
The total size of all tables seen in information_schema.tables
is much smaller than the size shown in the monitoring. I don’t know why.
Dictionary tables can only provide estimates, not precise values.
I also followed this query, but it still doesn’t match with 52.2T, and I don’t know what the NULL situation is.
A total of 20T, with NULL occupying 15T, but the monitoring shows 52.2T.
Look for those large tables by sorting them according to the number of rows. You won’t find them by table size.
information_schema.tables is based on statistical information. If the statistical information is up-to-date, it should reflect the actual size of the object.
Currently, the largest database found is less than 1.6T, and the total is less than 20T. It’s very strange what is occupying the space.
The largest database I found is less than 1.6T, and the total is less than 20T. I don’t know what is taking up the space.
How long is the GC time set, select * from mysql.tidb. The DB directory is 2.7T, how much space do the placeholder files, snap directory, raftdb directory, and log files occupy under the directory?