Is this deployment architecture feasible? Are there better solutions?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 这样的部署架构可以吗有更好的方案吗

| username: TiDBer_7Q5CQdQd

Four machines with eight cores and 16GB of RAM each.

According to the recommended deployment plan on the official website, tflash is deployed on one machine. What are the benefits of this?

| username: 像风一样的男子 | Original post link

The number of PDs should be odd.

| username: zhanggame1 | Original post link

Three PDs are enough; normally, only one is working while the other two are backups. The key issue lies in the configuration. A device with 16GB of memory cannot support deploying TiDB and TiKV on the same machine, as it may run out of memory at any time. It’s best to increase the memory to 32GB or 64GB.

| username: 啦啦啦啦啦 | Original post link

It is recommended to have 3 PDs, 4 TiKVs, and isn’t it a bit too many for TiDB and TiFlash?

| username: 啦啦啦啦啦 | Original post link

Where did you see this?

| username: TiDBer_7Q5CQdQd | Original post link

His TiFlash is on a separate machine.

| username: 啦啦啦啦啦 | Original post link

For production environments, it is recommended to use 2 nodes with 2 replicas to achieve high availability.

| username: Jellybean | Original post link

Using 4 machines (8 cores, 16GB RAM) to deploy a TiDB cluster, this configuration is not quite suitable for a production environment, so I infer that you intend to deploy a test cluster.

Assuming your traffic and data volume are not too large and the cluster can handle it:

  • The number of PD instances should be adjusted to an odd number, here it is recommended to be 3. PD is a distributed component based on etcd with a leader election protocol. According to best practices, deploying an odd number is the most reasonable.
  • It is recommended to have at least 2 TiDB instances, and you can add more if there are sufficient machine resources.
  • TiKV by default stores 3 replicas of data, so it is recommended to deploy at least 3 instances on different machines. Deploying 4 TiKV instances should be fine. However, it is recommended that the deployment disk for TiKV be SSD, otherwise, there may be performance issues, which need to be noted in advance.
  • TiFlash can be understood as the “fourth replica” of TiKV, and it is columnar storage. It supports deploying multiple disks on a single node on the same machine. TiFlash is mainly used for analytical computing tasks, i.e., AP scenarios, so it may consume a lot of computing resources (memory, CPU, IO, etc.), and it is recommended to have dedicated machine resources, i.e., deploy it separately on different machines. Even if TiFlash fails, data will not be lost, and it can continue to synchronize from TiKV to recover after restarting or redeploying the node, so the number of its nodes can be deployed as needed. For production environments, to ensure high availability and data security, it is recommended to deploy at least 2 nodes.
  • Prometheus and Grafana nodes are mainly for cluster monitoring components and do not consume much resources, so place them as needed. For production environments, it is recommended to deploy them separately.

For specific deployment configurations, the official documentation provides detailed descriptions and instructions, so it is recommended to read the official deployment documentation carefully and repeatedly before proceeding.

| username: TiDBer_7Q5CQdQd | Original post link

Let me ask you a question: Deploying one TiFlash node versus two nodes, aside from data high availability and security, will there be an improvement for a single slow SQL query?

| username: TiDBer_7Q5CQdQd | Original post link

Let me ask you a question: Deploying one TiFlash node versus two nodes, besides data high availability and security, will there be an improvement for a single slow SQL query?

| username: 啦啦啦啦啦 | Original post link

In theory, the more nodes there are, the better the performance, but more replicas don’t really improve performance.

| username: TiDBer_7Q5CQdQd | Original post link

Doesn’t multiple nodes mean multiple replicas?

| username: 啦啦啦啦啦 | Original post link

Not necessarily, for example, a 3-node setup can have 2 replicas.

| username: TiDBer_7Q5CQdQd | Original post link

So I understand that if there are three machines and two replicas, isn’t one of them not being used?

| username: 像风一样的男子 | Original post link

It is not recommended to set it to 2 replicas. If one machine fails, the entire cluster will be unusable. The default is three replicas.

| username: TiDBer_7Q5CQdQd | Original post link

I just checked the official website, and this is how I understand it: I currently have four TiFlash nodes deployed. If I create only one replica using SQL, will the data of this single replica be distributed across the four nodes?

| username: tidb菜鸟一只 | Original post link

Four 8-core 16GB machines are recommended to have 2 TiDB, 3 PD, 3 TiKV, and 1 TiFlash. If you have more nodes, nothing will run properly as all resources will be exhausted. Additionally, it is recommended to mount the data directories of PD, TiKV, and TiFlash on different disks on the same machine. If possible, perform NUMA resource isolation. I feel that with this configuration, running even a small amount of data might cause it to crash…

| username: 啦啦啦啦啦 | Original post link

That’s right, so you can take advantage of MPP features.

| username: 啦啦啦啦啦 | Original post link

Does 2 replicas refer to TiFlash, or does TiKV still require 3 replicas?

| username: 有猫万事足 | Original post link

4 machines each with 1 PD, 1 TiDB, 1 TiKV, and 1 TiFlash?

This setup is not recommended.
The stability is even worse than having 1 PD and 1 TiDB on one machine, and 3 TiKVs on 3 separate machines. You can squeeze the monitoring and PD together.

Anyway, you will conduct tests after deployment. You will find that when TiKV is under heavy IO pressure, it consumes a lot of CPU, whether it’s reading or writing. It can easily cause PD and TiDB to compete for CPU resources. The TSO wait time will be very high, and in extreme cases, the entire cluster can crash.

There’s no rush to deploy TiFlash. Once OLTP is stable, you can then proceed with OLAP without any issues.