The database suddenly crashed and cannot be connected

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 数据库突然炸了,连不上

| username: TiDBer_QHSxuEa1

This afternoon, the database suddenly became unreachable a few times, but it resolved itself after about twenty minutes. I want to investigate the cause. What should I do?

| username: 像风一样的男子 | Original post link

Is there server monitoring available? First, check if the server’s memory and CPU are fully utilized.

| username: TiDBer_QHSxuEa1 | Original post link

When it crashes, the server can’t be connected, and even if it connects, it’s very laggy.

| username: Jasper | Original post link

You can first use Grafana to monitor the overview - system info to determine if there are any resource bottlenecks on each server at the time of the fault, and then further analyze based on the specific bottleneck points.

| username: tidb菜鸟一只 | Original post link

When it crashes, the server can’t be connected, and even if it is connected, it is very laggy. This situation is generally due to memory exhaustion. You can check the sa log records in the /var/log/sa directory.

| username: kkpeter | Original post link

We generally encounter this problem when the table health is too low, causing the execution plan to be disordered, which leads to high server resource consumption and abnormal database functionality.

| username: caiyfc | Original post link

I have encountered this before. It is highly likely that the memory was exhausted, and after the operating system kills the process, you can normally SSH into the machine. What you need to do now is to first confirm whether the node on the machine (most likely the TiDB node) has been killed by the system. Then check the TiDB logs to find the SQL that is consuming memory and optimize it, while also adjusting the memory control parameters of TiDB. If it is caused by TiKV, then you need to reduce the block cache parameters of TiKV.

| username: 昵称想不起来了 | Original post link

Check the SQL to see if there are any anomalies? Database access timeout, high system load

| username: 有猫万事足 | Original post link

I encountered this situation where the machine’s performance was insufficient, and I placed both PD and TiKV on the same machine. TiKV maxed out the CPU, and PD became unresponsive.

| username: YuchongXU | Original post link

Through relevant system logs, database logs, and monitoring logs, troubleshoot.

| username: xfworld | Original post link

The machine configuration is not sufficient, causing intermittent freezing.

| username: Fly-bird | Original post link

Is there any progress?

| username: zhanggame1 | Original post link

Check the logs, look at the PD, TiDB, and TiKV logs from that time.

| username: Kongdom | Original post link

We encountered an issue where there were too many connections, resulting in no available idle connections.

| username: cy6301567 | Original post link

Check the monitoring logs. Several times, our machine restarted due to KV node memory overflow.

| username: wzf0072 | Original post link

We have encountered this before; the TiDB Server node experienced an OOM (Out of Memory) issue. Check the TiDB logs.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.