[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 7.5.1
Deploying a new cluster with 30TB of data, are there any recommended deployment plans for this? How many TiKV nodes are needed, and how much storage is required for each TiKV? Please provide some guidance.
Considering the future data growth and the corresponding scenario, whether it is AP or TP, it is generally recommended to follow the official suggestions for independent deployment, and you can also conduct business stress testing.
Cloud servers are easy to manage; you can just scale as needed. Mine is a physical machine, and procurement is very troublesome, so the disk space has to be very redundant to ensure growth over the next few years.
To deploy a cluster, you can check what machines are available. If you want fewer machines, then each disk needs to be larger, and the memory and CPU configuration of a single machine should be higher. In this case, you can deploy 2 to 3 nodes of the same type on a single machine.
Roughly calculated, the estimated number of TiKV machines needed is 9 = 30TB data / 50% disk usage / 3.5TB per disk / 2 disks per machine.
If you want a lower disk usage rate, then the number of machines should be appropriately increased.
If the memory and CPU of a single machine are high enough, such as 256GB memory and 100vCPU, you can evaluate attaching 3 disks per machine based on the business situation.
Additionally, add 3 PD nodes. Evaluate whether TiDB nodes can be considered for mixed deployment with PD based on the business situation. If the business has heavy AP analysis or OOM risk, then deploy TiDB independently.
Monitoring and control nodes can be deployed on 1 machine, placing them together. TiFlash and TiCDC can be determined based on business needs.
A single machine can run multiple TiKVs, with one TiKV per hard drive. As long as the resources are sufficient and the parameters are properly configured, there should be no problem.