TiFlash connection to TiKV timed out, causing slow SQL queries

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash连接tikv超时,导致sql查询慢

| username: magdb

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.1
[Encountered Problem: Phenomenon and Impact]
The cluster uses TiFlash to query data, and intermittently there are issues with extremely long SQL query times, affecting production business. Checking the TiFlash logs, there are connection timeout phenomena. How should this be handled?
[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]
SQL Execution Time

TiFlash Logs

| username: WalterWj | Original post link

Has TiFlash experienced any restarts or similar issues?

| username: magdb | Original post link

No, TiFlash has always been online.

| username: tidb菜鸟一只 | Original post link

Is it that none of the tables with TiFlash replicas can be queried, or is it just a specific table that cannot be queried?

| username: tidb菜鸟一只 | Original post link

Your slow SQL doesn’t seem like a summary type that would use TiFlash.

| username: magdb | Original post link

The phenomenon is that for a period of time, queries on tables with replicas are very slow, while at other times, the queries are normal. When the queries are slow, TiFlash reports the above error. Additionally, these are tables with added replicas.

| username: tidb菜鸟一只 | Original post link

Your two TiFlash instances are deployed together with TiKV. I suggest separating them. On nodes 23 and 24, remove one TiKV from one node and one TiFlash from the other. Of course, first roll back the TiFlash replica from 2 to 1.