How to Calculate the Reliability of a TiDB Cluster

translator_bot · June 22, 2024, 2:03pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何计算tidb集群的可靠性

| username: haoshuaili

Assume there are 3 physical machines.
Each physical machine has 10 disks.
Each disk corresponds to one TiKV process.
The reliability of a single disk is 95%.
The reliability of the physical machine excluding the disks is 98%.

translator_bot · June 22, 2024, 2:03pm

| username: xfworld | Original post link

RTO and RPO are two important concepts in Business Continuity (BC) and Disaster Recovery (DR), and they are also two critical metrics in the Service Level Agreement (SLA) of similar products.
Recovery Point Objective (RPO) refers to the maximum duration of data that might be lost.
Recovery Time Objective (RTO) refers to the maximum duration needed for the entire system to return to normal after a disaster occurs.
To put it simply, RPO is the time before the disaster, and RTO is the time after the disaster.

Refer to the above standards for calculation?

translator_bot · June 22, 2024, 2:03pm

| username: tidb菜鸟一只 | Original post link

It also depends on how many replicas you have set. If you have set the number of replicas to 1, then if any disk fails, your cluster will go down.