Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiFlash节点down掉会影响集群访问么?
[TiDB Usage Environment] Online
[TiDB Version] v5.4.2
[Encountered Problem]
The cluster has 2 TiFlash nodes. After one node loses network connection, the cluster cannot query related data normally. Does TiFlash also need to keep more than half of the nodes alive to be available?
[Reproduction Path] What operations were performed to cause the problem
[Problem Phenomenon and Impact]
[Attachment]
The image is not visible. Please provide the text you need translated.
Does TiFlash also need to keep more than half of the nodes alive to be available?
You don’t need half of them, as long as one node with a replica is not down, it will be fine.
Theoretically, it is just a Learner and does not participate in voting, so there is no concept of a majority. I feel that the connection handling is not good enough. When a certain TiFlash returns some errors, it causes the SQL to fail, which is why there is the fallback_to_tikv parameter.
Problem Summary:
When we get a backoff error from TiFlash, the SQL will always fail while the TiKV nodes are actually alive. We hope to fallback to TiKV after TiFlash is down.
Oh, then there might be an issue with this node. I’ll check it again tonight.
I’ll take another look in the evening.
This topic will be automatically closed 60 days after the last reply. No new replies are allowed.