How to query the space occupied by a table in the operating system in TiDB?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 中如何查询一张表在操作系统里的空间占用?

| username: OnTheRoad

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.2

A project requires deploying a TiDB cluster to migrate 30TB of existing data from ES to TiDB. We need to estimate the number of component nodes.

Assuming TiDB uses 3 replicas and a single TiKV does not exceed 2TB, 90TB of data would require 45 TiKV nodes. However, TiKV compresses data during storage.
Questions:

  1. What is the approximate compression ratio?

  2. How can we determine the space occupied by a data table in TiKV?

  3. The business scenario is OLAP. From a storage perspective alone, how many TiKV nodes are recommended for deployment?

| username: Raymond | Original post link

  1. Compression ratio, currently these are just estimates, and it is estimated to reach up to 9 times. It is recommended to migrate some data from ES to TiKV first to see what the compression ratio is.
  2. First, get the regions of this table by using show table tablename regions, then use tikv-ctl size -r region id to see how much space is actually occupied.
  3. If the scenario is OLAP, according to what you said, 45 TiKV instances are needed, and it should be possible to deploy multiple TiKV instances on a single machine.