Production Environment Deployment Plan, Seeking Suggestions

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 生产环境计划部署方案,大家给点建议

| username: Dais

[TiDB Usage Environment] Production Environment
[TiDB Version] V7.1.2
Prepared 3 physical machines for the production environment
Each server: 54C/128G/2NUMA Node
Planned resource allocation:
TIDB: 20C/48G
PD: 8C/16G
TIKV: 20C/54G
TIFLASH: 24C/54G
Monitoring: 4C/8G
Grafana: 4C/8G

Number of instances
TIDB: 2
PD: 3
TIKV: 3
TIFLASH: 1
Monitoring: 1
Grafana: 1

Using 3 machines for mixed deployment with numa+cgroup
Deployment diagram:

Topology file:

# Global variables are applied to all deployments and used as the default value of

# the deployments if a specific deployment value is missing.

global:
user: “tidb”
ssh_port: 22
deploy_dir: “/home/tidb/tidb-deploy”
data_dir: “/home/tidb/tidb-data”

monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115

server_configs:
tidb:
log.slow-threshold: 300
binlog.enable: false
binlog.ignore-error: false
performance.max-procs: 20
tikv:
readpool.storage.use-unified-pool: false
readpool.coprocessor.use-unified-pool: true
pd:
schedule.leader-schedule-limit: 4
schedule.region-schedule-limit: 2048
schedule.replica-schedule-limit: 64

pd_servers:

  • host: 192.168.0.1
    resource_control:
    memory_limit: “16G”
    cpu_quota: “800%”

  • host: 192.168.0.2
    resource_control:
    memory_limit: “16G”
    cpu_quota: “800%”

  • host: 192.168.0.3
    resource_control:
    memory_limit: “16G”
    cpu_quota: “800%”

tidb_servers:

  • host: 192.168.0.1
    numa_node: “0”
    resource_control:
    memory_limit: “48G”
    cpu_quota: “2000%”

  • host: 192.168.0.2
    numa_node: “0”
    resource_control:
    memory_limit: “48G”
    cpu_quota: “2000%”

tikv_servers:

  • host: 192.168.0.1
    numa_node: “1”
    resource_control:
    memory_limit: “54G”
    cpu_quota: “2000%”

  • host: 192.168.0.2
    numa_node: “1”
    resource_control:
    memory_limit: “54G”
    cpu_quota: “2000%”

  • host: 192.168.0.3
    numa_node: “1”
    resource_control:
    memory_limit: “54G”
    cpu_quota: “2000%”

tiflash_servers:

  • host: 192.168.0.3
    numa_node: “0”
    resource_control:
    memory_limit: “56G”
    cpu_quota: “2400%”

monitoring_servers:

  • host: 192.168.0.1
    numa_node: “0”
    resource_control:
    memory_limit: “8G”
    cpu_quota: “400%”

grafana_servers:

  • host: 192.168.0.2
    numa_node: “0”
    resource_control:
    memory_limit: “8G”
    cpu_quota: “400%”
    Please take a look at the plan and see if it is reasonable, and if there are any suggestions for optimization and adjustments. Thank you.
| username: zhanggame1 | Original post link

You can check out Three Nodes Hybrid Deployment Best Practices | PingCAP Documentation Center

| username: Billmay表妹 | Original post link

In a production environment, it is generally not recommended to deploy mixed workloads (unless you are quite confident in your operational capabilities) because resource contention issues may arise.

| username: zhanggame1 | Original post link

It would be best to remove tilflash.

| username: Kongdom | Original post link

In a mixed deployment scenario, it is recommended to remove the TiFlash node as it is not very meaningful and can easily drag down the TiKV nodes on the server.

| username: 小龙虾爱大龙虾 | Original post link

It is best to deploy TiFlash separately. It would be great if we could get another machine.

| username: Dais | Original post link

Well, that’s it, I’m completely lacking confidence now. I was a beginner to start with.

| username: Dais | Original post link

In that case, can I remove TiFlash first and add a machine to expand TiFlash later when the business volume increases?

| username: TI表弟 | Original post link

Deploy them separately. Mixing deployments might be convenient temporarily, but it will be a disaster if something goes wrong.

| username: zhanggame1 | Original post link

How about using a virtual machine?

| username: Miracle | Original post link

If the business volume increases in the future and TiFlash is needed, the current configuration will most likely be insufficient. Hybrid deployment may only be suitable for small-scale businesses with little growth. If the future load is within a controllable range, I think hybrid deployment is not a big problem. Proper planning in the early stages will save a lot of trouble later.

| username: forever | Original post link

Your mixed deployment is a bit severe.

| username: 随缘天空 | Original post link

It is not recommended to deploy in a mixed environment, as it is difficult to troubleshoot issues. If resources are insufficient, it is better to set up a single node.

| username: Fly-bird | Original post link

Deploy TiFlash independently

| username: 像风一样的男子 | Original post link

It is recommended to deploy TiKV on these 3 physical machines, with each machine starting 2 KV nodes per 64GB of memory, for a total of 6 KV nodes. Then, apply for some virtual machines to deploy TiDB and PD. Initially, TiFlash is not very useful.

| username: come_true | Original post link

Isn’t using three physical machines in a production environment too few?

| username: come_true | Original post link

That’s right, it’s better to deploy separately in a production environment. It’s easier to troubleshoot issues and the performance is better.

| username: come_true | Original post link

For beginners, it’s better to deploy separately. When you become very proficient, you can then consider mixed deployment. This way, if any issues arise, you’ll know how to troubleshoot them.

| username: 小龙虾爱大龙虾 | Original post link

Sure, if needed, TiFlash can be expanded later.

| username: oceanzhang | Original post link

It can be done with one machine, but if problems occur, it can be a bit of a headache. I suggest at least separating TiKV and TiDB.