Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 有一个tiflash节点Down无法启动
[TiDB Usage Environment] Production Environment
[TiDB Version] 5.0.1
[Reproduction Path] What operations were performed when the issue occurred: One TiFlash node is down and cannot start
[Encountered Issue: Symptoms and Impact] One TiFlash node is down and cannot start
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
tiflash.log screenshot
tiflash_tikv.log screenshot
tiflash_error.log screenshot
Sure, please provide the text you need translated.
It feels like there is a consistency issue with the data from TiKV to TiFlash. Has there been any operation or abnormal situation?
Didn’t do anything, just felt a bit slow today so I checked.
Scale out and then scale in.
You can check the restart time, compare the monitoring, and look at the CPU, memory, and disk I/O during the corresponding time period.
Additionally, it is best to have some redundancy with two or three replicas. If one TiFlash node has an issue, it will basically not affect the business.
If it doesn’t start, just scale up or down.
It seems that after TiKV 5.0, the CompactionFilter feature was enabled by default, which caused compatibility issues with TiFlash in version v5.0.1.
This issue was fixed in version v5.0.2 and later versions: TiDB 5.0.2 Release Notes | PingCAP 文档中心
Unable to operate, TiFlash seems to be automatically starting all the time, and the port is occupied.
The CPU and disk are both normal. When the issue occurred, the disk where TiFlash is located only had 90GB of space left. I feel that this might have been the original cause. After deleting all the extra logs, there is now half the space remaining.
High space usage might be due to TiFlash generating core files or similar activities.
It’s recommended to: set TiFlash replica to 0 >>> scale down TiFlash >>> scale up TiFlash >>> set TiFlash replica to 2.
The automatic startup of TiFlash is caused by the system daemon, which keeps trying to restart it.
Is the data volume large? If not, refer to the suggestions above: scale down and then scale up.