Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tilfash 挂掉以后,表就无法访问了
May I ask, if TiFlash goes down, is it normal that tables with TiFlash replicas cannot be queried, even if the tidb_isolation_read_engines
parameter includes TiKV?
If it doesn’t display as specified, what if TiFlash crashes in the production environment? Would all the queries that need TiFlash be unable to execute? This design doesn’t seem scientific. If TiFlash is inaccessible, it should fall back to accessing TiKV.
The tidb_allow_fallback_to_tikv
variable was introduced starting from version v5.0.
Scope: SESSION | GLOBAL
Persisted to the cluster: Yes
Default value: “”
This variable specifies the list of storage engines that will fall back to TiKV as a backup storage engine. When a storage engine in this list fails, causing the SQL statement execution to fail, TiDB will use TiKV as the storage engine to re-execute the SQL statement. Currently, this variable can be set to “” or “tiflash”. If this variable is set to “tiflash”, when TiFlash returns a timeout error (corresponding error code is ErrTiFlashServerTimeout), TiDB will use TiKV as the storage engine to re-execute the SQL statement.
Thank you, but I feel that setting the default value of this parameter to “” is not very reasonable; it should be set to tiflash!
This consideration is to prioritize ensuring TP business. If TiFlash goes down, many AP-type large queries hitting TiKV might affect TP business. Therefore, the default is “”, and it can be set according to the actual situation.
That makes sense, thank you, teacher. So, may I ask, how many seconds does TiDB consider TiFlash to be inaccessible before it starts accessing TiKV? What is the approximate timeout duration?
So, may I ask, how many seconds does TiDB consider TiFlash to be inaccessible before it starts accessing TiKV? What is this timeout duration approximately?
When running clientConn.handleQuery
/ clientConn.handleStmtExecute
, if we get ErrTiFlashServerTimeout
error and the execution of SQL doesn’t have side effect (If TiDB server already sends data to client or the execution is in cursor mode, the execution has side effect), we delete TiFlash from IsolationReadEngines
, retry executing the SQL and add TiFlash back to IsolationReadEngines
. Besides, we use a switch tidb_enable_tiflash_fallback_tikv
to control whether to retry and the switch is off by default.