How to Check the Actual Size of Data in a TiDB Cluster

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何查看TiDB集群数据的真实大小

| username: wakaka

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.0.6
[Reproduction Path] There is a significant discrepancy between PD monitoring and the data dictionary, and there are database names showing as null.
[Encountered Issues: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

  1. Query tables, less than 10T
    select sum(data_length+index_length)/1024/1024/1024 from INFORMATION_SCHEMA.tables;

  2. Query TIKV_REGION_STATUS, excluding NULL, adds up to less than 30T



There are two issues:

  1. What causes the db_name to be null?
  2. What causes the significant discrepancy between PD’s over 100T and the data dictionary, and how can it be resolved?
| username: Kongdom | Original post link

You can refer to similar posts

| username: wakaka | Original post link

This is the post I made, but the administrator might have thought it took too long and set the best answer for me. It doesn’t feel right. The difference is too big, 30T and over 100T.

| username: GreenGuan | Original post link

Check if there are any empty regions.

| username: Kongdom | Original post link

Yes, check the empty regions. I have a cluster where there is still actual space, but Grafana shows insufficient space. This is because there are too many empty regions, causing a significant difference between the values shown in Grafana and those seen in CentOS.

| username: wakaka | Original post link

The issue of empty regions has been posted before, 空region不合并 - TiDB 的问答社区

| username: wakaka | Original post link

The issue of empty regions has been posted before, 空region不合并 - TiDB 的问答社区

| username: 裤衩儿飞上天 | Original post link

Grafana’s statistics also include the data volume of TiFlash.

| username: wakaka | Original post link

Adding TiFlash still makes a big difference, now it’s over 100TB.

| username: WalterWj | Original post link

Go to the data disk and run du -sh -d 1. This will give you the most accurate physical storage usage statistics.

| username: Kongdom | Original post link

What I encountered before was that the physical storage was, for example, 30G, but in Grafana, it showed 300G. As a result, the 1T hard disk was judged to have used 900G with three replicas, indicating insufficient node space, and all leader replicas were moved away. However, the actual physical storage was just 30G.
In the end, I rebuilt the cluster through backup and restore, and Grafana displayed normally.

| username: wakaka | Original post link

The physical usage is over 100TB, but the business thinks it’s unreasonable and shouldn’t be this large. We want to check the databases and tables with high usage, and the difference is significant. Moreover, after deleting large tables, the space hasn’t been released even after waiting for a day.

| username: wakaka | Original post link

This cluster is too large, backup and recovery are not very feasible.

| username: WalterWj | Original post link

If it feels unreasonable, check if the GC is stuck. For example, if compact is turned off or if the GC adjustments are unreasonable, it could lead to increased space.

This has been encountered in the community before.

| username: wakaka | Original post link

The GC parameter is set to 10 minutes, and compact is not disabled. It seems to have gotten stuck yesterday afternoon when some large table drop operations were performed.


| username: WalterWj | Original post link

Let’s wait and see.

| username: wakaka | Original post link

Yes, let’s wait for this GC to complete for now. However, the issue of the large gap in volume has always existed. I’m not sure if it’s due to the slow GC collection causing long-term backlog. If the collection speed is too slow each time, it feels like there will always be unfinished collections. Is there any way to improve GC?

| username: WalterWj | Original post link

You can try the new version, I remember there are optimizations like compact filter.

| username: wakaka | Original post link

Our business has a large number of DDL operations. The official documentation states that DDL operations cannot be executed during an upgrade, which is difficult to avoid. Is there any other solution besides upgrading?

| username: 胡杨树旁 | Original post link

Is it possible to manually reclaim empty regions that do not merge? PD Control 使用说明 | PingCAP 文档中心 Nearby region merge, cross-table empty region merge.