How to Implement sum() in a Multi-Replica Scenario

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 多副本场景下,如何实现sum()

| username: atidat

Taking the above image as an example, the configured number of replicas is 2; this table/partitioned table has 5 regions.

In sum(), how is it achieved that sum() = r1 + r2 + r3 + r4 + r5?

| username: 我是咖啡哥 | Original post link

I didn’t understand, what do you want to ask? What do you mean by sum?

| username: atidat | Original post link

sum(column_name) calculates the sum of a column.

| username: 胡杨树旁 | Original post link

It should be summing up each TiKV first and then putting it into the TiDB server for calculation.

| username: 裤衩儿飞上天 | Original post link

Operators are pushed down to each node for computation and then aggregated in TiDB.

| username: atidat | Original post link

Wouldn’t that result in data duplication? For example, with region1, it has been computed on both node-A and node-B.

| username: 胡杨树旁 | Original post link

It should be that only the leader can read and write at the same time, right?

| username: atidat | Original post link

I don’t think so.

  1. If the TiFlash engine is used, the region identity is learner, which would be problematic;

  2. Furthermore, back to TiKV, writing through the leader is fine, but if reading is done through the leader, does that mean data aggregation is done by the leader or the leader has the full data of the table?

| username: Kongdom | Original post link

First, it is important to clarify that this leader is a leader replica, not a leader node. The leader replica has complete data. The leader replicas are evenly distributed across various nodes.

| username: atidat | Original post link

So, TiDB will push down operators to each TiKV node, and after each node’s region-X leader aggregates the data, the TiDB server will summarize it. Is that correct?

Additionally, TiFlash regions are only identified as learners, so how is that achieved?

| username: Kongdom | Original post link

What I understand is that, like TiKV, it is divided into primary and secondary replicas. The primary replica is responsible for read and write operations, while the secondary replica performs asynchronous replication.


TiFlash provides the same snapshot isolation support as TiKV and ensures that the latest data can be read (ensuring that previously written data can be read). This consistency is achieved through replication progress verification.

Each time a read request is received, the Region replica in TiFlash will initiate a progress check with the Leader replica (a very lightweight RPC request). The read response is only given when the progress ensures that the data covered by the read request timestamp is included.

| username: atidat | Original post link


This replication mechanism also inherits TiKV’s automatic load balancing and high availability: it does not rely on additional replication pipelines but directly receives data transmission from TiKV in a many-to-many manner; as long as the data in TiKV is not lost, the TiFlash replicas can be restored at any time.

The above statement should indicate that TiFlash’s regions are all synchronized from TiKV. In this case, the regions in each TiFlash actually have no identity differences.

| username: Kongdom | Original post link

I usually set TiFlash to a single replica, so I don’t pay much attention to operations under multiple replicas for TiFlash.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.