After TiDB OOM, restarting the virtual machine causes cluster startup errors

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB oom之后重启虚拟机 启动集群出错

| username: TiDBer_5M9L07sN

[Test Environment] Testing environment
[TiDB Version]
[Encountered Problem] After TiDB OOM, the machine became sluggish and the server was restarted. Upon rebooting and starting the cluster, the following issue appeared as shown in the image below:

[Reproduction Path]
[Problem Phenomenon and Impact]

| username: TiDBer_5M9L07sN | Original post link

I think the main reason is that the execution plan is different. You can use explain analyze to see the execution plan and the time spent on each operator.

| username: xfworld | Original post link

How did you restart it? The logs describe that PD crashed~

If restarting still can’t restore the PD state, it’s likely that data loss occurred, causing the service to malfunction.

You can only use PD’s unsafe recovery method… (which will probably result in data loss)

| username: TiDBer_5M9L07sN | Original post link

Perform the related recovery operations.

| username: tidb狂热爱好者 | Original post link

Can’t imagine restarting just because of OOM (Out of Memory).

| username: Hacker007 | Original post link

Shouldn’t you avoid restarting all the servers at once? Can’t you manually start each instance now?

| username: TiDBer_5M9L07sN | Original post link

Yes, is there any way to fix it?

| username: Hacker007 | Original post link

What is your startup sequence? Try starting PD first, then the TiDB server, and finally start TiKV.

| username: TiDBer_5M9L07sN | Original post link

The current startup sequence for a single node is that pd, tikv, and tidb have all successfully started; however, it is inaccessible.

| username: Hacker007 | Original post link

Check the TiDB logs.

| username: TiDBer_5M9L07sN | Original post link

Directly started from the command line, saw a fatal message in TiKV


Didn’t see it in other PD and TiDB

| username: TiDBer_CEVsub | Original post link

It is recommended to perform data recovery.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.