Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TIDB 数据分片一些疑问

I looked at the official documentation, and TiKV automatically shards the underlying data by Key Range, distributing the data across all nodes in the cluster in units of Regions. Each Region’s data is stored with multiple replicas, and these replicas are stored on different nodes, forming a Raft Group.
So, my understanding is that, for example, if 990,000 pieces of data come in, TiKV shards the data by Range and distributes it to, say, 3 nodes to prevent the pressure of 1 million pieces of data from being on a single node while the other two nodes have nothing. For instance, each node stores 330,000 pieces of data. However, I have a question: it is also mentioned that a Region has multiple replicas, and these replicas are on different nodes. Does this mean that, for example, 330,000 pieces of data are stored on node A, with two replicas on nodes B and C? Another 330,000 pieces of data are stored on node B, with replicas on nodes A and C. In this case, each node still stores 990,000 pieces of data (one set of 330,000 data and two sets of replicas from nodes B and C, totaling 330,000 + 330,000 + 330,000 = 990,000). The pressure on each node hasn’t been reduced. Where did I go wrong in my understanding? Another question is, TiKV’s storage model is a Key-Value model, but what is MySQL’s storage model? I couldn’t find this information. I hope the experts can answer these two questions. Thank you.