Issues with Offline Upgrade Using TiUP

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiup离线升级问题

| username: magongyong

[TiDB Usage Environment] Production Environment
[TiDB Version] Upgraded from tidb v5.4.0 to v5.4.3
At the end of the upgrade, an error occurred when stopping node_exporter, with the following information:

This issue prevents the continuation of the process using tiup cluster replay audit_id. Even if the node_exporter on the relevant cluster servers has been manually stopped, the error persists.

[Encountered Issues]

  1. How to manually complete the remaining steps, which are to stop node_exporter, stop blackbox_exporter, start node_exporter, and start blackbox_exporter.
  2. Although the bug has been fixed at this stage and the MySQL client shows version 5.4.3 when logging into the cluster, tiup cluster display still shows the cluster or components as version 5.4.0.
  3. Reproducing the issue in a test environment shows that only re-upgrading can solve the problem. However, restarting causes business disconnection, so I want to know if there is a way to solve this issue without restarting. Thank you.

[Reproduction Path] Operations that led to the issue
Manually deployed other node_exporters on component servers, then executed the upgrade operation, encountered an error, manually stopped node_exporter, executed tiup cluster replay audit_id, and still encountered an error.

[Issue Phenomenon and Impact]
Upgrade failed with an error and exited.

Currently, apart from display issues, other impacts are unclear.

[Attachments]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: h5n1 | Original post link

Check if the port for the exporter is still being occupied.

| username: magongyong | Original post link

No, they are all closed.

| username: h5n1 | Original post link

Have you checked with lsof and netstat?

| username: magongyong | Original post link

I have checked with netstat.

| username: magongyong | Original post link

I tested the cluster, and re-upgrading with tiup cluster upgrade was fine, but tiup cluster replay didn’t work.

| username: h5n1 | Original post link

Take a look at the contents of the audit file.

| username: magongyong | Original post link

| username: magongyong | Original post link

May I ask if there is a command to manually execute the remaining steps?

| username: magongyong | Original post link

I searched around, but there aren’t many posts about the upgrade issue, and I couldn’t find any articles about the upgrade principles. :joy:

| username: h5n1 | Original post link

Other monitoring components can scale in and then scale out, but these two exporters cannot. You can manually replace the exporter’s bin file and then modify the version number in the meta.yaml file under the .tiup directory to display it correctly.

| username: magongyong | Original post link

Thank you very much, I’ll give it a try.

| username: jansu-dev | Original post link

Is there any update?

| username: magongyong | Original post link

Subsequently, only meta.yaml was modified, nothing else was touched. It has been running for about a week without any impact.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.