Cluster Upgrade Failure

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 集群升级失败

| username: Jolyne

[Test Environment for TiDB]
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
Single-node 7.1.0 test environment online upgrade to 7.5.0 reported an error, seems related to tso, not very clear
[Encountered Issue: Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: Jasper | Original post link

The upgrade timed out. Try adding --wait-timeout 1200 during the upgrade and try again.

| username: terry0219 | Original post link

Is there an issue with the PD node?

| username: linnana | Original post link

Port 4000 is not accessible, 2 minutes timeout, check if the cluster process port is normal.

| username: 路在何chu | Original post link

You should have checked the cluster status before the upgrade. It’s unlikely that the network is disconnected.

| username: 小龙虾爱大龙虾 | Original post link

It’s just one machine. The image error might not be the key issue. Try looking through the logs again.

| username: dba远航 | Original post link

Connection to PD failed.

| username: Jellybean | Original post link

It looks like there was an issue when starting tidb-server to access pd. Please check if the pd status is normal first.

| username: Jolyne | Original post link

It is indeed a timeout. What could be the reason for this? Is it related to machine configuration and network?

| username: 路在何chu | Original post link

Take a look and see if the network of each node is normal.

| username: Jolyne | Original post link

I have only one node, deployed on a single machine.

| username: 江湖故人 | Original post link

It might still be due to low configuration. How long did it take in total the second time it succeeded?

| username: TIDB-Learner | Original post link

The port might be occupied and not released in time, causing a timeout.

| username: Jasper | Original post link

It’s not due to machine configuration or network issues; it’s simply that the upgrade process is relatively long. The default timeout is 2 minutes, so even if the upgrade process is normal, it will be considered a timeout if it exceeds 2 minutes. Therefore, it’s best to set this parameter to a larger value during the upgrade.

| username: Jolyne | Original post link

Okay, since there was no timeout error during previous upgrades, I’m curious if it’s because this time it’s a single-node test environment that caused the error.

| username: 麻烦是朋友 | Original post link

There are upgrade plans at the end of the year, just coming to take a look.

| username: 这里介绍不了我 | Original post link


| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.