Actual Size of TiDB Cluster

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB集群真实大小

| username: wakaka

How to check the actual size of TiDB cluster data

[TiDB Environment] Production
[TiDB Version] 5.0.6
[Problem Encountered] Doubts about the size shown in monitoring
[Reproduction Path] Operations performed that led to the issue
[Problem Phenomenon and Impact]
Monitoring shows 52T

But querying information_schema

The data difference is too large

[Attachment]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: h5n1 | Original post link

The information in information_schema.tables is inaccurate due to the system structure, and there are also 3 replicas.

| username: wakaka | Original post link

How can I find the actual data size? It feels like there’s a big difference right now.

| username: h5n1 | Original post link

The monitoring data is the size of the cluster, with 3 replicas. Check if there are many log files in the data directory, the size of the placeholder files, the size of the snap directory, and the size of the raftdb directory.

| username: wakaka | Original post link

It’s the size of the cluster, with 3 replicas. Check if there are many log files in the data directory, the size of the placeholder files, the size of the snap directory, and the size of the raftdb directory.

image
The db directory is 2.7T.

| username: wakaka | Original post link

17 TiKVs

| username: 啦啦啦啦啦 | Original post link

52.2TB, this is basically accurate. You can verify it by adding up the sizes of the data directories of all TiKV nodes.

| username: wakaka | Original post link

However, the business side says that the actual size of each database is not that large. Is there a dictionary table that can check the actual size of each database?

| username: wakaka | Original post link

Now we need to investigate which database table is occupying the most space; some unnecessary data may need to be cleaned up.

| username: wakaka | Original post link

Now we need to investigate which database table is occupying the most space. There may be some unnecessary data that needs to be cleaned up.

| username: forever | Original post link

Here’s a suggestion you can try: For a single table, use the data size in information_schema.tables divided by the number of rows in information_schema.tables, and then multiply by the real-time data volume of the table.

| username: wakaka | Original post link

The total size of all tables seen in information_schema.tables is much smaller than the size shown in the monitoring. I don’t know why.

| username: 啦啦啦啦啦 | Original post link

Dictionary tables can only provide estimates, not precise values.

| username: wakaka | Original post link

I also followed this query, but it still doesn’t match with 52.2T, and I don’t know what the NULL situation is.

| username: wakaka | Original post link

A total of 20T, with NULL occupying 15T, but the monitoring shows 52.2T.

| username: h5n1 | Original post link

Look for those large tables by sorting them according to the number of rows. You won’t find them by table size.

| username: alfred | Original post link

information_schema.tables is based on statistical information. If the statistical information is up-to-date, it should reflect the actual size of the object.

| username: wakaka | Original post link

Currently, the largest database found is less than 1.6T, and the total is less than 20T. It’s very strange what is occupying the space.

| username: wakaka | Original post link

The largest database I found is less than 1.6T, and the total is less than 20T. I don’t know what is taking up the space.

| username: h5n1 | Original post link

How long is the GC time set, select * from mysql.tidb. The DB directory is 2.7T, how much space do the placeholder files, snap directory, raftdb directory, and log files occupy under the directory?