A TiFlash Node is Down and Unable to Start

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 有一个tiflash节点Down无法启动

| username: 点点-求助来了

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.0.1
[Reproduction Path] What operations were performed when the issue occurred: One TiFlash node is down and cannot start
[Encountered Issue: Symptoms and Impact] One TiFlash node is down and cannot start
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
tiflash.log screenshot

tiflash_tikv.log screenshot

tiflash_error.log screenshot

| username: 我是人间不清醒 | Original post link

Sure, please provide the text you need translated.

| username: TI表弟 | Original post link

It feels like there is a consistency issue with the data from TiKV to TiFlash. Has there been any operation or abnormal situation?

| username: 点点-求助来了 | Original post link

Didn’t do anything, just felt a bit slow today so I checked.

| username: TI表弟 | Original post link

Scale out and then scale in.

| username: TI表弟 | Original post link

You can check the restart time, compare the monitoring, and look at the CPU, memory, and disk I/O during the corresponding time period.

| username: TI表弟 | Original post link

Additionally, it is best to have some redundancy with two or three replicas. If one TiFlash node has an issue, it will basically not affect the business.

| username: 健康的腰间盘 | Original post link

If it doesn’t start, just scale up or down.

| username: JaySon-Huang | Original post link

It seems that after TiKV 5.0, the CompactionFilter feature was enabled by default, which caused compatibility issues with TiFlash in version v5.0.1.

This issue was fixed in version v5.0.2 and later versions: TiDB 5.0.2 Release Notes | PingCAP 文档中心

| username: 点点-求助来了 | Original post link

Unable to operate, TiFlash seems to be automatically starting all the time, and the port is occupied.

| username: 点点-求助来了 | Original post link

The CPU and disk are both normal. When the issue occurred, the disk where TiFlash is located only had 90GB of space left. I feel that this might have been the original cause. After deleting all the extra logs, there is now half the space remaining.

| username: WalterWj | Original post link

High space usage might be due to TiFlash generating core files or similar activities.

It’s recommended to: set TiFlash replica to 0 >>> scale down TiFlash >>> scale up TiFlash >>> set TiFlash replica to 2.

The automatic startup of TiFlash is caused by the system daemon, which keeps trying to restart it.

| username: zhaokede | Original post link

Is the data volume large? If not, refer to the suggestions above: scale down and then scale up.