Added two TiFlash nodes, executed "ALTER TABLE tab SET TIFLASH REPLICA 1"; data synchronization shows no progress after a while

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 新增两个tiflash节点,执行alter table tab SET TIFLASH REPLICA 1;数据同步一点后无进度

| username: qiuxb

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v4.0.8
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]

Two old TiFlash nodes were abnormal. After recycling and decommissioning them, two new TiFlash nodes were added.
After executing ALTER TABLE tab SET TIFLASH REPLICA 1;
Data synchronized a bit, but then there was no progress. From the TiFlash machine, the disk space also stopped growing.

[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: tidb狂热爱好者 | Original post link

Wait for 0.05.

| username: Jolyne | Original post link

TiFlash pulls logs from TiKV and asynchronously synchronizes the data. You can wait a bit longer to see.

| username: qiuxb | Original post link

There has been a change, but it’s too slow. This is a table with 10 million rows, and I also have a table with 2.2 billion rows. How can I speed it up?

| username: tidb狂热爱好者 | Original post link

Upgrade hardware

| username: qiuxb | Original post link

TiFlash is on a separate physical machine, all of which are high-end configurations. It seems the bottleneck is not in TiFlash.

| username: qiuxb | Original post link

Currently, there are 7 tables, and TiFlash replicas have been built. Four tables have progress. Will this synchronization affect the performance of TiKV?

| username: tidb菜鸟一只 | Original post link

The performance of TiFlash in version 4.0 is estimated to be somewhat lacking, but the performance in the new version will be much stronger.

| username: Jolyne | Original post link

No, it will only read the raftlog on TiKV without performing any operations on TiKV. The two are separate.

| username: wuxiangdong | Original post link

Adding nodes to TiFlash should improve the situation.

| username: 魔礼养羊 | Original post link

The reasons for slowness are nothing more than hardware (CPU/MEM/network IO/disk IO), software environment (application system interference), and database execution efficiency. Based on the principle that TFlash reads TiKV’s raft log, first, check if there are any bottlenecks in the CPU and memory (Tiflash and TiKV). If there are none, then increasing hardware resources is not very meaningful. Secondly, check the system processes. Generally, database systems are deployed independently, so this probability is estimated to be low, but it can be checked. Finally, regarding database execution efficiency, you should first look at the logs to see if there are any system failures or warning logs. Only providing the result of slow execution without logs makes it difficult to diagnose the problem.