Issues and Solutions Encountered During In-Place Upgrades

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 原地升级遇到过的问题和解决方案

| username: gary

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] I would like to ask everyone about the issues and solutions encountered during in-place upgrades in the production environment, to learn from them.
[Encountered Issues: Problem Symptoms and Impact]

| username: Jellybean | Original post link

Before performing an in-place upgrade, it is essential to conduct compatibility testing for the new version, validate the business in a test environment in advance, and communicate with the business side to prepare for emergencies. Additionally, ensure a complete backup of the entire cluster’s data. If possible, it is recommended to set up a primary-secondary configuration to facilitate easy switching in case of issues.

| username: zhaokede | Original post link

Generally, it is prepared in advance to upgrade in the test environment first, then test the business, and if there are no issues, consider upgrading the production environment.

| username: gary | Original post link

The test environment has been adapted, mainly want to see if anyone has encountered any issues during in-place upgrades, just a heads-up in advance :rofl:

| username: gary | Original post link

Are there any cases of issues encountered during in-place upgrades that you can share? I’d like to learn from them. :facepunch:

| username: zhanggame1 | Original post link

tidb升级到7.5.0, tidb servers升级报错了怎么处理 - TiDB 的问答社区 Take a look at mine

| username: Soysauce520 | Original post link

In-place upgrades need to be cautious about conflicts caused by creating views simultaneously on the TiDB server. This issue was quite classic in version 6.5, and there are several cases in the community. Keeping just one TiDB server can resolve it.

| username: gary | Original post link

Did you add a timeout parameter when upgrading in your case? I think I encountered this issue before, where an error occurred while creating the system table.

| username: gary | Original post link

Are you referring to leaving only one tidb-server node during an online upgrade to prevent conflicts caused by creating views? Would this affect online business?

| username: 这里介绍不了我 | Original post link

Is there a contingency plan in place?

| username: gary | Original post link

Here it is.

| username: 连连看db | Original post link

Major version upgrade or minor version upgrade?

| username: wakaka | Original post link

  1. Set up a downstream synchronization cluster for testing to evaluate business compatibility and performance.
  2. After testing is complete, if the original cluster needs to be upgraded in place, DDL operations should be stopped to avoid potential issues.
| username: zhanggame1 | Original post link

No parameters added.

| username: zhanggame1 | Original post link

Send another one tidb 7.5.0升级到7.5.1 dashborad登录不用了 - TiDB 的问答社区

| username: ffeenn | Original post link

At the beginning of the year, I upgraded the plan:

  1. It is necessary to thoroughly understand the differences between the two versions.
  2. Clone two identical clusters on the test site (check if parameters and other configurations have been modified).
  3. Test whether there are any issues during the upgrade process for the two clusters.
  4. After the upgrade is complete, conduct application testing and perform full functionality testing or use the application for a period of time, which is generally one month in my case.
  5. After testing is complete, prepare for the online upgrade. List the upgrade precautions and steps (summarized during testing, very important).
  6. Perform a comprehensive data backup. There are two upgrade plans: one is to create a new cluster for migration, and the second is direct migration. I recommend the first one. For my online setup, I used the first method with TiCDC for data synchronization. The reason is that the first method allows switching back at any time. The second method, direct upgrade, carries risks. It can be done because you have already tested it and there are generally no major issues, but remember that data backup is essential.
  7. After the first plan, observe for one to two months after the upgrade is complete, and then destroy the old cluster.
| username: TiDBer_aaO4sU46 | Original post link

Make sure to back up. Everything else is fine.

| username: buddyyuan | Original post link

I have encountered several issues with in-place upgrades:

  1. The issue of conflicts caused by multiple TiDB servers creating views simultaneously.
  2. Another time, setting the tidb_enable_amend_pessimistic_txn caused a conflict issue.
| username: gary | Original post link

Yes, cross-machine migration is definitely the best choice, but the customer also has a situation where resources are insufficient. :weary:

| username: TiDBer_5cwU0ltE | Original post link

The plan is to first pass the tests, learn from the pitfalls, and only then move to production. A fallback plan must also be prepared.