How to Check the Data Size of a Specific Database in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB查看指定库的数据量大小

| username: 摊煎饼果子的蒙其

【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】
【Encountered Issues】
【Reproduction Path】What operations were performed to encounter the issue
【Issue Phenomenon and Impact】
【Attachments】

  • Relevant logs, configuration files, Grafana monitoring (https://metricstool.pingcap.com/)
  • TiUP Cluster Display information
  • TiUP Cluster Edit config information
  • TiDB-Overview monitoring
  • Corresponding module’s Grafana monitoring (if any, such as BR, TiDB-binlog, TiCDC, etc.)
  • Corresponding module logs (including logs from 1 hour before and after the issue)

If the question is related to performance optimization or troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.

| username: 摊煎饼果子的蒙其 | Original post link

Is it accurate, or is there a better calculation method?

| username: xiaohetao | Original post link

I haven’t found a better way to calculate the size of a single schema either.

| username: Kongdom | Original post link

You can refer to this, using monitoring as the standard:

| username: Ming | Original post link

Calculating the size of a specific database alone doesn’t seem to have a good method at the moment. It is relatively estimated for judgment, and there is no specific accurate algorithm. Only the size of the entire database and individual TiKV can be viewed through monitoring.

| username: gary | Original post link

  1. First, manually collect the statistics, then use the following syntax to check the estimated size of the specified database:
select table_schema, sum(data_length)/1024/1024 as data_length, sum(index_length)/1024/1024 as index_length, sum(data_length+index_length)/1024/1024 as sum from information_schema.tables where table_schema = "$table_name" group by table_schema;
  1. The specific values should be based on the monitoring data.