Background
TiDB server is a stateless node, but active database sessions may have ongoing transactions. If it is restarted directly, transaction failure will affect the transaction success rate and bring a bad experience to end users.
Many applications connect to a database using a load balancer and a connection pool. Before shutdown a TiDB server. We need to make sure two things:
- Load balancer stops sending new connections to this TiDB server.
- All existing connections on this TiDB server close.
TiDB server has a parameter graceful-wait-before-shutdown
(TiDB Configuration File | PingCAP Docs) If the value of this parameter is not 0, when the TiDB server receives a signal to shutdown, it will wait for the specified time before shutdown. While waiting, the TiDB server stops responding to load balancer, allowing the loader balancer to route connections to other TiDB server nodes.
Meanwhile, most connection pools have a maxlifetime
parameter. The parameter specifies the max lifetime of a connection. When a long database connection exceeds maxlifetime
, it will be closed if it is inactive. Maxlifetime
usually is a relatively long time period, e.g. 10 minutes. This allows new application requests to reuse existing connections in the connection pool, to achieve better performance.
This means we often need to set tidb’s graceful-wait-before-shutdown
to a time longer than maxlifetime
of the connection pool. To allow all the connections to close and achieve minimum impact shutting down a TiDB server.
Issues
There are 2 issues we need to address to increase graceful-wait-before-shutdown
:
- TiUP deploys TiDB server as a systemd service. Systemd service has a default
TimeoutStopSec
of 90 seconds. If TiDB server does not shutdown within 90 seconds, systemd will kill Tidb server. - The TiDB systemd config file
/etc/systemd/system/tidb-4000.service
is maintained by TiUP. Though we can modify this file, certain TiUP operations will overwrite this file, removing all changes.
Solution
The solution is to create an override file /etc/systemd/system/tidb-4000.service.d/override.conf
, say we want to increase graceful-wait-before-shutdown
to 600, add following content in override.conf
(add 30 seconds overhead):
[Service]
TimeoutStopSec=630
Reload systemd:
$ sudo systemctl daemon-reload
Check systemd status:
$ systemctl status tidb-4000
● tidb-4000.service - tidb service
Loaded: loaded (/etc/systemd/system/tidb-4000.service; enabled; preset: disabled)
Drop-In: /etc/systemd/system/tidb-4000.service.d
└─override.conf
Active: active (running) since Sat 2024-06-22 15:18:26 UTC; 9min ago
Main PID: 1777 (tidb-server)
Tasks: 10 (limit: 18909)
Memory: 357.9M
CPU: 24.559s
CGroup: /system.slice/tidb-4000.service
└─1777 bin/tidb-server -P 4000 --status=10080 --host=0.0.0.0…
Override.conf
is loaded as Drop-In. Now we have increased TimeoutStopSec
to 630. And this override.conf
will not be overwritten by TiUP.
To gracefully shutdown TiDB server:
$ tiup cluster stop {cluster_name} -N {tidb_ip}:4000 --wait-timeout 630
Be aware that TiUP has a default wait timeout of 120 seconds. Set this time to be longer than graceful-wait-before-shutdown
using TiUP command line option --wait-timeout
.