【TiDB Usage Environment】Production Environment
【TiDB Version】
【Encountered Issues: Problem Phenomenon and Impact】
【Resource Configuration】Cluster built with 5 machines, each with 64GB RAM and 32 cores
May I ask everyone, how to plan the cluster more safely and reasonably?
Without needing Tiflash, how to plan it better?
Can TiDB and PD be configured on only 2 machines?
If the business volume is large, it is recommended to have more TiDB nodes, as it is quite performance-intensive. Increase it to 4 nodes. PD does not consume much resources, so install 3 or 5 nodes as the documentation requires an odd number. For TiKV, 4 or 5 nodes are recommended, but I think 4 nodes are sufficient. Use the remaining machine for TiUP and monitoring.
There are only 5 machines in total. According to your implementation:
tidb 101, 102, 103, 104
pd 101, 102, 103
tikv 101, 102, 103, 104
monitoring 105
Will this ensure no issues?
However, your machines are quite powerful, so you shouldn’t have to compromise. By using NUMA to bind and isolate CPU resources, you can achieve hybrid deployment.
Only machines with a single NUMA core need to consider deploying PD and TiDB on one machine and TiKV on a separate one. Without NUMA isolation of CPU resources, TiKV can max out the CPU under heavy write pressure, causing PD on the same machine to become unresponsive, which can crash the entire cluster.
As long as CPU resources can be isolated, there is no such concern, and they can be deployed together.
There are templates in the documentation, but you’ll need to calculate and adjust the parameters yourself. Given your machine’s performance, prioritizing hybrid deployment and switching only if it doesn’t work seems more appropriate.
I only have 2 PDs purely due to lack of resources, running a small operation.
I don’t feel secure with just 1, and I can’t fit 3. Even so, the monitoring is still sharing a machine with other services.