If TiKV and PD crash, will the process automatically restart, where is the automation script, and how often does it check?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv和pd如果挂掉,会自动启动进程,自动化脚本是在哪里,多久检测一次

| username: 月明星稀

If TiKV and PD crash, it is observed that the processes will automatically restart. Where is the automation script located, how often does it check, or which process is responsible for the detection? Please help clarify, experts.

| username: TI表弟 | Original post link

I deployed it using tiup and did some shallow research. It is managed by systemd, the system process daemon tool for CentOS.

| username: Jellybean | Original post link

The components and processes related to the TiDB cluster will automatically restart the service after a crash, all managed uniformly by the operating system’s systemd system management daemon. When the systemctl automatically restarts, the execution scripts it calls are usually located in the script subdirectory of the node deployment directory.

Based on previous usage experience, it usually restarts within seconds. You can check the specific interval for the restart.

| username: 江湖故人 | Original post link

Systemd detection, every 15 seconds

| username: 月明星稀 | Original post link

Under what circumstances will the process restart? Will it restart when handling down or disconnect?

| username: Jellybean | Original post link

A daemon runs in the system background for a long time, periodically checking the status of the target process. If an abnormal operation occurs, it will automatically restart it. If the target service is set to start automatically at boot, it is also pulled up through the systemd daemon.

| username: TIDB-Learner | Original post link

I just remembered the restart-always parameter in Docker containers.

| username: chenhanneu | Original post link

In the /etc/systemd/system/ directory, there are actually links to files in the /etc/systemd/system/multi-user.target.wants directory. When a service is set to start automatically, it simply adds a symbolic link to the service item in the .wants directory of a certain target (by default, it is added to /etc/systemd/system/multi-user.target.wants).

Running tiup cluster disable tidb-test disables the auto-start, and the TiDB-related files (linked to /etc/systemd/system/xxxx) in the /etc/systemd/system/multi-user.target.wants directory will be removed. Running tiup cluster enable tidb-test enables the auto-start, and the TiDB-related files (links) in the /etc/systemd/system/multi-user.target.wants directory will be automatically added back.

root@xxxx: /etc/systemd/system/multi-user.target.wants# cat tikv-111111.service 
[Unit]
Description=tikv service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
CPUQuota=100%
LimitNOFILE=1000000
LimitSTACK=10485760
User=tidbtest
ExecStart=/bin/bash -c '/test/tidb-deploy/tikv-1111/scripts/run_tikv.sh'
Restart=always
RestartSec=15s

[Install]
WantedBy=multi-user.target

Parameter explanation:

  • Restart=always: always restart regardless of the exit reason.
  • RestartSec=15s: interval before restarting the service.

When deploying components, the corresponding component system services will be automatically added. After exiting, the service will automatically call the component’s run_xxx.sh to restart.

| username: 春风十里 | Original post link

This should not be controlled by a script, but by a program.

| username: Jayjlchen | Original post link

The person above is correct. Systemctl manages services, and the default configuration RestartSec=15s for auto-restart.

| username: wangccsy | Original post link

Is there a heartbeat program?

| username: dba远航 | Original post link

Systemd detection, every 15 seconds

| username: 这里介绍不了我 | Original post link

Systemd` detection, every 15 seconds