Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 多副本场景下,如何实现sum()
Taking the above image as an example, the configured number of replicas is 2; this table/partitioned table has 5 regions.
In sum(), how is it achieved that sum() = r1 + r2 + r3 + r4 + r5?
I didn’t understand, what do you want to ask? What do you mean by sum?
sum(column_name) calculates the sum of a column.
It should be summing up each TiKV first and then putting it into the TiDB server for calculation.
Operators are pushed down to each node for computation and then aggregated in TiDB.
Wouldn’t that result in data duplication? For example, with region1, it has been computed on both node-A and node-B.
It should be that only the leader can read and write at the same time, right?
First, it is important to clarify that this leader is a leader replica, not a leader node. The leader replica has complete data. The leader replicas are evenly distributed across various nodes.
So, TiDB will push down operators to each TiKV node, and after each node’s region-X leader aggregates the data, the TiDB server will summarize it. Is that correct?
Additionally, TiFlash regions are only identified as learners, so how is that achieved?
What I understand is that, like TiKV, it is divided into primary and secondary replicas. The primary replica is responsible for read and write operations, while the secondary replica performs asynchronous replication.
Consistency
TiFlash provides the same snapshot isolation support as TiKV and ensures that the latest data can be read (ensuring that previously written data can be read). This consistency is achieved through replication progress verification.
Each time a read request is received, the Region replica in TiFlash will initiate a progress check with the Leader replica (a very lightweight RPC request). The read response is only given when the progress ensures that the data covered by the read request timestamp is included.
Skeptical.
This replication mechanism also inherits TiKV’s automatic load balancing and high availability: it does not rely on additional replication pipelines but directly receives data transmission from TiKV in a many-to-many manner; as long as the data in TiKV is not lost, the TiFlash replicas can be restored at any time.
The above statement should indicate that TiFlash’s regions are all synchronized from TiKV. In this case, the regions in each TiFlash actually have no identity differences.
I usually set TiFlash to a single replica, so I don’t pay much attention to operations under multiple replicas for TiFlash.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.