TiKV cannot start offline

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv离线起不来

| username: 今天不想写代码

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.2.0
[Encountered Problem: Symptoms and Impact]

The TiKV in the online environment went offline and could not be started after using tiup cluster.

| username: TiDBer_jYQINSnf | Original post link

Rebuild this node, it’s the simplest and most reliable method.

| username: zhanggame1 | Original post link

Scaling up or down, there’s no better way.

| username: 像风一样的男子 | Original post link

Your version 6.2 is a DMR version and not suitable for production use.

| username: ffeenn | Original post link

Check if the node’s disk space is full or if there are no write permissions?

| username: Kongdom | Original post link

Doesn’t the appearance of “welcome” mean that it has started up?

| username: TiDBer_jYQINSnf | Original post link

“Welcome” is the first sentence, and “ready to serve” indicates that it has truly started up.
In this case, it is continuously restarting.

| username: xfworld | Original post link

It doesn’t have much impact, just shrink it and expand it again.

| username: TiDBer_QKDdYGfz | Original post link

Following, it’s really scary to perform scaling down in production.

| username: Kongdom | Original post link

:flushed: I hadn’t noticed this detail before~

| username: 希希希望啊 | Original post link

No big deal, shrink the node and then expand it again.

| username: lemonade010 | Original post link

Was the offline time too long? Did it cause the log to be overwritten?

| username: zhaokede | Original post link

Focus on solving the problem.
You can try this: first perform a hard backup of the operating system, then reduce and expand the capacity on the hard backup. It’s safer this way.

| username: TiDBer_ZxWlj6A1 | Original post link

Is it really that fragile? Scaling up and down seems quite troublesome.

| username: 呢莫不爱吃鱼 | Original post link

First scale down, then scale up.

| username: 小于同学 | Original post link

Upgrade the version.

| username: 今天不想写代码 | Original post link

I tried to shrink it, but it’s been two hours and it’s still not done. Is it broken and unable to shrink?

| username: 像风一样的男子 | Original post link

How many nodes do you have in total? Don’t tell me you only have three nodes and one of them is down?

| username: xfworld | Original post link

No, before you shrink, you need to check if the number of nodes is sufficient. If not, you need to expand first, then shrink.

| username: TiDBer_jYQINSnf | Original post link

Even if it’s broken, it can still shrink. When you execute store, the regions on it will gradually decrease. Other TiKV nodes will replenish the replicas.
If there is still a leader on it, then your cluster has a problem.