How many TiFlash replicas are recommended to be set?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash副本数建议设置几个

| username: TiDBer_Jzo3iMXn

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.1
[Reproduction Path] Table with over 60 million rows, executing left join and aggregate queries
[Encountered Problem: Phenomenon and Impact] Single SQL query takes 0.4 seconds, with 40 concurrent queries, each SQL query takes 2.7 seconds
[Resource Configuration] A total of 4 machines, details as follows

  1. 16 cores, 32 threads, 64GB, independently deployed TiDB server
  2. 16 cores, 32 threads, 64GB, deployed TiKV, PD, and TiFlash together
  3. Same as 2
  4. Same as 2

I have set the TiFlash replica to 2, and the execution plan indeed uses TiFlash. When not concurrent, a single query is very fast, 0.4 seconds;
But when querying the same table concurrently, it becomes very slow, with a single query taking 2.7 seconds;
Is it because reading the same table causes too frequent IO?
Additionally, how many TiFlash replicas are recommended to be set?

| username: MrSylar | Original post link

  1. TiFlash should be deployed separately and physically isolated from other components.
  2. Multiple replicas of TiFlash can be used not only for MPP but also for high availability. The number of replicas you need depends on your business response requirements and high availability needs.
  3. To understand the reasons for slowness, it’s better to compare the execution plans.
| username: zhanggame1 | Original post link

Tiflas is used for OLAP and is not suitable for concurrency.

| username: Kongdom | Original post link

  • replica_count is the expected number of data replicas on TiFlash set by the user through alter table xxx set tiflash replica N.
  • available indicates whether a complete TiFlash replica is available. For example, if replica_count = 2 but only 1 replica has been synchronized, available is still 1 (because it can be queried at this point).
  • replica_count is principally related to high availability and has no relation to query performance. However, in practice, having too many data replicas might affect performance because the cluster has to manage more data. Balancing high availability and query performance, it is generally recommended to set 2 TiFlash replicas.
  • The performance of TiFlash is not related to the number of TiKV replicas.
| username: cy6301567 | Original post link

Usually, two are enough.

| username: 逍遥_猫 | Original post link

Generally, it is recommended to set 2 TiFlash replicas.

| username: 逍遥_猫 | Original post link

May I ask, if TiFlash has 2 replicas, is the data for these 2 replicas transferred through the TiKV log?

| username: ShawnYan | Original post link

A TiFlash can only create one replica. To create two replicas, you need to start two TiFlash services. Are you asking if the replicas in the two TiFlash instances are both copied from tikv, rather than one TiFlash copying from tikv and the other TiFlash copying from the first TiFlash?

The replicas in the two TiFlash instances are copied from tikv respectively. You can see the specific principle here:

| username: zhanggame1 | Original post link

General recommendation 2

| username: Kongdom | Original post link

:astonished: Does two replicas mean two TiFlash nodes?

| username: TiDBer_vfJBUcxl | Original post link

Two replicas mean that the data is stored in two copies.

| username: TiDBer_vfJBUcxl | Original post link

TiKV uses Raft for data replication. Each data change is recorded as a Raft log, and through Raft’s log replication feature, the data is safely and reliably synchronized to every node in the replication group. However, in actual writes, according to the Raft protocol, it is only necessary to replicate to a majority of nodes to safely consider the data write successful.
[Chapter 3 of the TiDB Series: Principles of TiDB Distributed Database Storage] 您的访问出错了

| username: realcp1018 | Original post link

Well, in our practice, two replicas are sufficient.
However, the performance of analytical queries is mainly related to the number of nodes. Adding more nodes is necessary to leverage the MPP architecture for acceleration.

| username: tony5413 | Original post link

Usually, it’s 2.

| username: redgame | Original post link

Two, this is very consistent…

| username: cy6301567 | Original post link

Regarding the setting of TiFlash replica count, it is recommended to consider the following factors:

  1. Performance Requirements: Increasing the number of TiFlash replicas can improve query performance and fault tolerance but will also increase resource consumption. You need to balance performance and resource consumption based on your business needs and query load.
  2. Hardware Resources: Each TiFlash replica requires certain hardware resources, including CPU, memory, and storage. You need to ensure that there are enough hardware resources in the cluster to support the required number of replicas.
  3. Data Replication Latency: TiFlash replicas need to synchronize data with each other, and increasing the number of replicas may lead to increased data synchronization latency. This could affect the real-time nature of queries.
  4. Fault Tolerance: Increasing the number of TiFlash replicas can enhance the system’s fault tolerance, as other replicas can continue to provide query services when some replicas fail.

Generally, it is recommended to maintain at least two TiFlash replicas to ensure basic fault tolerance and high availability. Depending on business needs and hardware resources, you may consider increasing the number of TiFlash replicas to three or more to provide data security assurance.

| username: 啦啦啦啦啦 | Original post link

Increasing the number of replicas will not improve performance.

| username: zhanggame1 | Original post link

Two, although I haven’t tested it, that’s what everyone says.

| username: 烂番薯0 | Original post link

Typically, there are two TiFlash instances, but you should avoid deploying them together with TiKV. It’s best to keep them isolated.

| username: 昵称想不起来了 | Original post link

Generally 2