After a single Tiflash node crashes, SQL cannot continue to use the TIKV node execution plan and directly reports that the region is unavailable

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 单个Tiflash节点宕机后,SQL无法继续使用TIKV节点执行计划,直接报region不可用

| username: Curiouser

【TiDB Usage Environment】Production Environment
【TiDB Version】

 6.1.1

【Encountered Problems】

  1. After a single TiFlash node crashes, SQL cannot continue to use the TiKV node and directly reports that the region is unavailable.
  2. I saw a related case issue on the forum: tiflash宕机,优化器不能感知tiflash宕机,执行计划依然使用tiflash存储引擎,导致sql不能出结果。 - TiDB 的问答社区, but I am not sure if setting the system environment “tidb_allow_fallback_to_tikv” can solve the problem.

【Reproduction Path】What operations were performed that caused the problem
【Problem Phenomenon and Impact】

| username: WalterWj | Original post link

SQL that needs to go through TiFlash will be a disaster most of the time if it falls back to TiKV after TiFlash goes down.

| username: Curiouser | Original post link

It’s okay. The amount of data in the tables involved in SQL is not large.

| username: tidb菜鸟一只 | Original post link

Can’t you set this environment variable by using set global tidb_allow_fallback_to_tikv ='tiflash';?

| username: Curiouser | Original post link

I haven’t tested it yet because it’s a production environment and I can’t try it easily. I need to see if anyone has set it up and confirmed that it’s OK before I set it up. According to this article tiflash宕机,优化器不能感知tiflash宕机,执行计划依然使用tiflash存储引擎,导致sql不能出结果。 - #12,来自 林先森cC - TiDB 的问答社区, someone set it up, but it doesn’t seem to have any effect.

| username: Lucien-卢西恩 | Original post link

That’s the parameter. It is recommended to conduct performance testing. If a full table scan crashes TiKV, it would be counterproductive.