Are the data volumes the same for each node in a 3-node 3-replica setup and a 5-node 5-replica setup?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 请教下3节点3副本和5节点5副本,每个节点的数据量是一样多的吗?

| username: TiDBer_Y2d2kiJh

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v5.4.0
[Reproduction Path] Considering how much storage TiKV needs, are the data volumes on each node the same for 3 nodes with 3 replicas and 5 nodes with 5 replicas, assuming all other conditions are the same and each TiKV node is on a separate TiKV server?
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: 像风一样的男子 | Original post link

If other settings and disk sizes are the same, theoretically, each node in these two clusters will have the same amount of data.

| username: Kongdom | Original post link

What I understand is that the total data volume on each node is the same. Looking only at the effective data (leader data), each node is different. For three replicas, each node has 50, and for five replicas, each node has 30.
Each node will have 1 copy of leader data and 2 or 4 copies of follower data.

Assuming the data volume is 150, in the ideal state:
For 3 nodes with 3 replicas, the primary data is 150, and the secondary data is 150x2=300,
Each server only occupies the space of 150 pieces of data, and the total occupied space should be 450 pieces of data.
Node A: 50 primary from A + 50 secondary from B + 50 secondary from C
Node B: 50 primary from B + 50 secondary from A + 50 secondary from C
Node C: 50 primary from C + 50 secondary from A + 50 secondary from B

For 5 nodes with 5 replicas, the primary data is 150, and the secondary data is 150x4=600,
Each server only occupies the space of 150 pieces of data, and the total occupied space should be 750 pieces of data.
Node A: 30 primary + 30 secondary + 30 secondary + 30 secondary + 30 secondary
Node B: 30 primary + 30 secondary + 30 secondary + 30 secondary + 30 secondary
Node C: 30 primary + 30 secondary + 30 secondary + 30 secondary + 30 secondary
Node D: 30 primary + 30 secondary + 30 secondary + 30 secondary + 30 secondary
Node E: 30 primary + 30 secondary + 30 secondary + 30 secondary + 30 secondary

| username: Jasper | Original post link

Without considering additional label configurations, it is the same amount.

| username: tidb菜鸟一只 | Original post link

Of course, it’s the same. With 3 nodes and 3 replicas, each node stores one replica. With 5 nodes and 5 replicas, each node still stores one replica. The difference is that with 3 nodes and 3 replicas, you can only afford to lose one node and still maintain normal service, whereas with 5 nodes and 5 replicas, you can afford to lose two nodes.

| username: Jasper | Original post link

Each node is still 150 according to your calculation, it’s the same. Three replicas are 503=150, and five replicas are 305=150.

| username: 大飞哥online | Original post link

Theoretically the same amount.

| username: Kongdom | Original post link

:joy: I wrote less, it’s that the leader of each node is different. For three replicas, each node has 50 leaders, and for five replicas, each node has 30 leaders. I’ll make the correction.

| username: 像风一样的男子 | Original post link

How is it not the same…