Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 这样的部署架构可以吗有更好的方案吗
Four machines with eight cores and 16GB of RAM each.
According to the recommended deployment plan on the official website, tflash is deployed on one machine. What are the benefits of this?
The number of PDs should be odd.
Three PDs are enough; normally, only one is working while the other two are backups. The key issue lies in the configuration. A device with 16GB of memory cannot support deploying TiDB and TiKV on the same machine, as it may run out of memory at any time. It’s best to increase the memory to 32GB or 64GB.
It is recommended to have 3 PDs, 4 TiKVs, and isn’t it a bit too many for TiDB and TiFlash?
His TiFlash is on a separate machine.
For production environments, it is recommended to use 2 nodes with 2 replicas to achieve high availability.
Using 4 machines (8 cores, 16GB RAM) to deploy a TiDB cluster, this configuration is not quite suitable for a production environment, so I infer that you intend to deploy a test cluster.
Assuming your traffic and data volume are not too large and the cluster can handle it:
- The number of PD instances should be adjusted to an odd number, here it is recommended to be 3. PD is a distributed component based on etcd with a leader election protocol. According to best practices, deploying an odd number is the most reasonable.
- It is recommended to have at least 2 TiDB instances, and you can add more if there are sufficient machine resources.
- TiKV by default stores 3 replicas of data, so it is recommended to deploy at least 3 instances on different machines. Deploying 4 TiKV instances should be fine. However, it is recommended that the deployment disk for TiKV be SSD, otherwise, there may be performance issues, which need to be noted in advance.
- TiFlash can be understood as the “fourth replica” of TiKV, and it is columnar storage. It supports deploying multiple disks on a single node on the same machine. TiFlash is mainly used for analytical computing tasks, i.e., AP scenarios, so it may consume a lot of computing resources (memory, CPU, IO, etc.), and it is recommended to have dedicated machine resources, i.e., deploy it separately on different machines. Even if TiFlash fails, data will not be lost, and it can continue to synchronize from TiKV to recover after restarting or redeploying the node, so the number of its nodes can be deployed as needed. For production environments, to ensure high availability and data security, it is recommended to deploy at least 2 nodes.
- Prometheus and Grafana nodes are mainly for cluster monitoring components and do not consume much resources, so place them as needed. For production environments, it is recommended to deploy them separately.
For specific deployment configurations, the official documentation provides detailed descriptions and instructions, so it is recommended to read the official deployment documentation carefully and repeatedly before proceeding.
Let me ask you a question: Deploying one TiFlash node versus two nodes, aside from data high availability and security, will there be an improvement for a single slow SQL query?
Let me ask you a question: Deploying one TiFlash node versus two nodes, besides data high availability and security, will there be an improvement for a single slow SQL query?
In theory, the more nodes there are, the better the performance, but more replicas don’t really improve performance.
Doesn’t multiple nodes mean multiple replicas?
Not necessarily, for example, a 3-node setup can have 2 replicas.
So I understand that if there are three machines and two replicas, isn’t one of them not being used?
It is not recommended to set it to 2 replicas. If one machine fails, the entire cluster will be unusable. The default is three replicas.
I just checked the official website, and this is how I understand it: I currently have four TiFlash nodes deployed. If I create only one replica using SQL, will the data of this single replica be distributed across the four nodes?
Four 8-core 16GB machines are recommended to have 2 TiDB, 3 PD, 3 TiKV, and 1 TiFlash. If you have more nodes, nothing will run properly as all resources will be exhausted. Additionally, it is recommended to mount the data directories of PD, TiKV, and TiFlash on different disks on the same machine. If possible, perform NUMA resource isolation. I feel that with this configuration, running even a small amount of data might cause it to crash…
That’s right, so you can take advantage of MPP features.
Does 2 replicas refer to TiFlash, or does TiKV still require 3 replicas?
4 machines each with 1 PD, 1 TiDB, 1 TiKV, and 1 TiFlash?
This setup is not recommended.
The stability is even worse than having 1 PD and 1 TiDB on one machine, and 3 TiKVs on 3 separate machines. You can squeeze the monitoring and PD together.
Anyway, you will conduct tests after deployment. You will find that when TiKV is under heavy IO pressure, it consumes a lot of CPU, whether it’s reading or writing. It can easily cause PD and TiDB to compete for CPU resources. The TSO wait time will be very high, and in extreme cases, the entire cluster can crash.
There’s no rush to deploy TiFlash. Once OLTP is stable, you can then proceed with OLAP without any issues.