Analysis of the Region PEERS Field

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: region PEERS 字段分析

| username: 小跑跑泡

First, I deployed TiDB normally with 3 TiDB, 3 TiKV, and 3 TiFlash.
(1) After deployment, I executed the command: set config pd replication.max-replicas=1 to set the number of region replicas to 1.
Question 1: Will this command store only one copy of the data in TiKV? Because by default, the Raft protocol stores three copies, which takes up too much disk space.
User answer: Yes
Question 2: Will storing one copy improve query efficiency? Will it improve write efficiency?
User answer: It will not improve query efficiency, but it will improve write efficiency.
Question 3: If a batch of data has already been imported into the database before executing this command, will the data before executing the command automatically be reduced to one copy?
User answer: Yes
Question 4: After setting the number of replicas to 1, will the PD server distribute the data randomly or evenly across the three TiKV nodes? It shouldn’t always target one node, right?
User answer: Evenly distributed

(2) Then execute the command ALTER TABLE Faces SET TIFLASH REPLICA 1; to synchronize a copy of the KV data to TiFlash.
Then check the region: it is found that some regions have one peer, and some have two peers.


Question 5: The one with one peer should be TiKV, and the one with two peers should be one TiKV and one TiFlash, right?
Answer: Yes
Question 6: Then why is there one with only one peer? Is it because it hasn’t synchronized to TiFlash yet? Some have two peers.
User answer: .
Question 7: When querying using TiFlash, is there a possibility that the data has been stored in TiKV but not yet synchronized to TiFlash, leading to missing query data? Is there a solution?
User answer: No, if it hasn’t synchronized but needs to read from TiFlash, the read will be blocked and eventually timeout.

| username: TiDBer_vfJBUcxl | Original post link

Set the number of replicas for the region to 1 with the command set config pd replication.max-replicas=1.

| username: TiDBer_vfJBUcxl | Original post link

Storing a copy improves write efficiency, but it should not affect read efficiency.

| username: TiDBer_vfJBUcxl | Original post link

You can use this command to view the current number of replicas: SHOW config WHERE NAME LIKE '%max-replicas%';

| username: 小跑跑泡 | Original post link

So now the data in TiKV will only be stored in one copy, right?

| username: 小跑跑泡 | Original post link

Got it. Do you have any thoughts on questions 3 and 4?

| username: WalterWj | Original post link

3 occurrences
4 uniformly distributed

| username: WalterWj | Original post link

No, it won’t. If it hasn’t synchronized but needs to read from TiFlash, the read operation will get stuck and eventually time out and fail.

| username: zhanggame1 | Original post link

I only know that 3 works, I haven’t tested the others yet.

| username: tidb菜鸟一只 | Original post link

Question 3: If a batch of data has already been entered into the database, and then this command is executed, will the data before executing the command be automatically deleted to one copy?
Answer: Yes.

Question 4: After setting the number of replicas to 1, will the pdserver distribute the data randomly or evenly across the three TiKV nodes? It shouldn’t always target one node, right?
Answer: It will be distributed across the three machines. Whether it is evenly distributed depends on whether your table is a clustered table and the primary key configuration rules.

Question 7: When querying using TiFlash, is there a possibility that the data has already been entered into TiKV but has not been synchronized to TiFlash, leading to missing query data? Is there a solution for this?
Answer: There will be a slight data delay. You can choose to increase the synchronization speed, but it will put a bit more pressure on the cluster.

| username: 小跑跑泡 | Original post link

Thank you, expert. Could you please help me with the 6th question? Some peers only have one, which means there is only one region, right? Some have two. Why is that? I checked the synchronization progress with TiFlash, and it has been completed.

| username: redgame | Original post link

If the data volume of a certain Region is relatively large, it may result in the data of that Region being stored only on one TiKV node and not allocated to TiFlash nodes. This may cause the Region to have only one peer.

| username: 小跑跑泡 | Original post link

So the data will not be synchronized to TiFlash? Will my queries to TiFlash be affected?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.