Some Questions About TiDB Data Sharding

translator_bot · June 21, 2024, 11:55am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB 数据分片一些疑问

| username: TiDBer_JlY1JCJ5

I looked at the official documentation, and TiKV automatically shards the underlying data by Key Range, distributing the data across all nodes in the cluster in units of Regions. Each Region’s data is stored with multiple replicas, and these replicas are stored on different nodes, forming a Raft Group.

So, my understanding is that, for example, if 990,000 pieces of data come in, TiKV shards the data by Range and distributes it to, say, 3 nodes to prevent the pressure of 1 million pieces of data from being on a single node while the other two nodes have nothing. For instance, each node stores 330,000 pieces of data. However, I have a question: it is also mentioned that a Region has multiple replicas, and these replicas are on different nodes. Does this mean that, for example, 330,000 pieces of data are stored on node A, with two replicas on nodes B and C? Another 330,000 pieces of data are stored on node B, with replicas on nodes A and C. In this case, each node still stores 990,000 pieces of data (one set of 330,000 data and two sets of replicas from nodes B and C, totaling 330,000 + 330,000 + 330,000 = 990,000). The pressure on each node hasn’t been reduced. Where did I go wrong in my understanding? Another question is, TiKV’s storage model is a Key-Value model, but what is MySQL’s storage model? I couldn’t find this information. I hope the experts can answer these two questions. Thank you.

translator_bot · June 21, 2024, 11:55am

| username: 像风一样的男子 | Original post link

Each region will have a leader distributed across three nodes. When querying, it will query different sets of 330,000 data on the three nodes. Isn’t that distributing the load?

translator_bot · June 21, 2024, 11:55am

| username: dba远航 | Original post link

Writing is based on the LEADER, and the three nodes have different LEADERs to share the write pressure. At the same time, reading can also be done from the FOLLOWER to share the read pressure.

translator_bot · June 21, 2024, 11:55am

| username: zhanggame1 | Original post link

Your understanding is correct. By default, 3tik with 3 replicas means that each TiKV stores a copy of the 990,000 data. With the default configuration, read and write operations only access the leader of that region, so the read and write load can still be distributed across the 3 nodes.

translator_bot · June 21, 2024, 11:55am

| username: andone | Original post link

There are leader nodes and follower nodes. The data on the leader can be read and written, while the follower is read-only.

translator_bot · June 21, 2024, 11:55am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.