Total Capacity Requirements for Data Migration with TiDB Lightning

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-lightning迁移数据时总容量要求

username: TiDBer_djgos04V

[TiDB Usage Environment] Production Environment
The official documentation states that the total storage space of the target TiKV cluster must be greater than the data source size × number of replicas × 2. For example, if the cluster uses 3 replicas by default, then the total storage space needs to be more than 6 times the size of the data source. So if I originally have 1TB of data in MySQL, the TiDB cluster requires at least 6TB of free space before migration. But doesn’t TiDB compress the data? Wouldn’t a large portion of this space be left unused? Also, when using Lightning to import data, the official documentation mentions the need to create a temporary folder. If I now run Lightning on a server with the TiKV component, with a data source of 20GB and 35GB of remaining capacity on the server, but the total capacity in the cluster is 150GB, will the data import be successful?

username: tidb菜鸟一只

Here, we are actually considering the worst-case scenario. In reality, there is 1TB of data in MySQL, and migrating to TiDB with 3 replicas would also be around 1TB of data. However, we need to consider the worst-case scenario to prevent errors during the import process from increasing the workload.

username: zhanggame1

The original poster is correct, it will compress and free up some space.
The compression ratio varies depending on the data, so it’s best to import a portion first to test the actual usage. For example, test with 100GB and provide an accurate estimate of the disk overhead.

username: redgame

Leave more, the data in different environments varies, and it’s not certain how much compression will be achieved.

username: 昵称想不起来了

To be on the safe side, it’s better to follow the official recommendations. Additionally, according to the official suggestions, you should try to reserve space and not use below the healthy water level usage rate. Otherwise, there will be issues with usage after the migration is completed.