How to Gracefully Upgrade TiKV Version with Business Operations

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何优雅地带着业务升级 tikv版本

| username: TiDBer_gnAY93b4

Scenario: Currently, our business directly uses the transaction API of the community version of TiKV, with the client being github.com/tikv/client-go/v2. The transaction logic includes a set of key get and then set operations (thus, if TiKV detects a conflict, there is a possibility of transaction failure).
Version: Planning to upgrade from TiKV version 4.0.5 to 5.2.1.
Deployment method: Using Docker for deployment, without using TiUP for management.

Since there are very few users directly using TiKV on the internet, and no official upgrade process documentation could be found, we simply performed a rolling restart (restarting one node, querying the node status with pd-ctl to ensure it is up, then waiting 10 seconds before proceeding to the next one). During the implementation process, some TiKV transactions failed due to conflict errors. Logs confirm that these transactions were not executed concurrently (logical conflicts are impossible).
Attached is a screenshot of the relevant error logs.

I would like to ask if there is a more graceful way to reduce or avoid such situations. Thank you.

| username: Jellybean | Original post link

The idea of upgrading nodes one by one through a rolling operation is correct. If you want a more graceful upgrade without the business being aware, you can consider first evicting the leader on the node, and then upgrading it after there are no leaders.

This operation is transparent to the business and does not affect it. The downside is that each node must first evict the leader, then upgrade, and then allow the leader to migrate back. This process is cumbersome and time-consuming.

If you have tiup, the operation will be simpler. If not, you will have to manually adjust the scheduling strategy of tipd and upgrade.

| username: TiDBer_gnAY93b4 | Original post link

May I ask if the task of evicting a leader can be accomplished through pd-ctl / tikv-ctl? If there are no commands, is there any reference for programming calls?

| username: xfworld | Original post link

Yes, refer to this:

scheduler add grant-leader-scheduler 1 // Schedule all Region leaders to store 1
scheduler add evict-leader-scheduler 1 // Evict all Region leaders from store 1

| username: TiDBer_gnAY93b4 | Original post link

Thank you :grinning:

| username: Jellybean | Original post link

Yes, you can check the link posted by the expert above.

Also, if you encounter any issues during the operation, you can ask questions in the forum.

| username: swino | Original post link

You can give it a try.

| username: dba远航 | Original post link

Can’t you use tiup to upgrade directly?

| username: TiDBer_小阿飞 | Original post link

His environment does not use tiup for management.

| username: heiwandou | Original post link

Evict the leader first and then upgrade.

| username: andone | Original post link

Rolling upgrade