Version Upgrade

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 版本升级

| username: 胡杨树旁

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
The cluster was originally version 5.0.0 and we wanted to upgrade to version 6.3.0. The upgrade process failed, but when logging into the database, the database version shows as 6.3.0. Using tiup cluster list shows the cluster version as 5.0.0. How can we determine if the versions of each component were successfully upgraded?

| username: Kongdom | Original post link

Please provide the error message for the upgrade.

| username: 胡杨树旁 | Original post link

Upgraded again.

| username: Kongdom | Original post link

Are the TiDB nodes and PD leader nodes not interconnected?
You can stop the cluster first, then upgrade, and finally start the cluster again.

| username: 胡杨树旁 | Original post link

51 and 52 are connected.

| username: Kongdom | Original post link

Is port 2379 open? Is there a firewall?

| username: 胡杨树旁 | Original post link

Error: stderr: : executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.0.0.52:22’ {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin tar --no-same-owner -zxf /home/tidb/deploy/tikv-20162/bin/tikv-v6.3.0-linux-amd64.tar.gz -C /home/tidb/deploy/tikv-20162/bin && rm /home/tidb/deploy/tikv-20162/bin/tikv-v6.3.0-linux-amd64.tar.gz}, cause: Run Command Timeout
Now this error is reported

| username: caiyfc | Original post link

What is the command to execute the upgrade?

| username: 胡杨树旁 | Original post link

tiup cluster upgrade mnn-cluster v6.3.0

| username: caiyfc | Original post link

When upgrading, you can try adding --ssh system, which will use the system’s SSH for the upgrade. The error above seems to be caused by a lack of permission to execute the command, leading to a timeout.

| username: 胡杨树旁 | Original post link

Using this command gives this error, does it mean that I haven’t configured mutual trust and haven’t configured sudo?

| username: xingzhenxiang | Original post link

Why not use the tidb user to execute the upgrade?

| username: 胡杨树旁 | Original post link

Deployed under the root user.

| username: xingzhenxiang | Original post link

Got it, this works too. Giving you a thumbs up.

| username: tony5413 | Original post link

Try again after configuring mutual trust for TiDB users.

| username: 胡杨树旁 | Original post link

The TiDB user configured mutual trust and re-executed the upgrade command, resulting in the following error messages:

[2023/03/20 17:14:13.629 +08:00] [INFO] [base_client.go:299] ["[pd] cannot update member from this address"] [address=http://10.0.0.51:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.0.0.51:2379 status:IDLE: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.0.0.51:2379 status:IDLE"]
[2023/03/20 17:14:15.520 +08:00] [ERROR] [base_client.go:144] ["[pd] failed updateMember"] [error="[PD:client:ErrClientGetLeader]get leader from [http://10.0.0.51:2379] error"]
[2023/03/20 17:14:13.918 +08:00] [ERROR] [region_cache.go:2272] ["loadStore from PD failed"] [id=5] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/03/20 17:14:19.618 +08:00] [ERROR] [error.go:321] ["encountered error"] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"] [stack="github.com/tikv/client-go/v2/error.Log\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/error/error.go:321\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/internal/locate/region_cache.go:490\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/internal/locate/region_cache.go:458"]
[2023/03/20 17:14:17.412 +08:00] [INFO] [client.go:791] ["[pd] tso stream is not ready"] [dc=global]
[2023/03/20 17:14:19.744 +08:00] [INFO] [base_client.go:299] ["[pd] cannot update member from this address"] [address=http://10.0.0.51:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.0.0.51:2379 status:READY: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.0.0.51:2379 status:READY"]
[2023/03/20 17:14:24.747 +08:00] [ERROR] [region_cache.go:2272] ["loadStore from PD failed"] [id=5] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/03/20 17:14:24.758 +08:00] [ERROR] [error.go:321] ["encountered error"] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"] [stack="github.com/tikv/client-go/v2/error.Log\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/error/error.go:321\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/internal/locate/region_cache.go:490\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/internal/locate/region_cache.go:458"]
[2023/03/20 17:14:24.766 +08:00] [ERROR] [base_client.go:144] ["[pd] failed updateMember"] [error="[PD:client:ErrClientGetLeader]get leader from [http://10.0.0.51:2379] error"]
[2023/03/20 17:14:24.843 +08:00] [ERROR] [kv.go:243] ["fail to load safepoint from pd"] [error="context deadline exceeded"]
[2023/03/20 17:14:25.999 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=11.593182767s]
[2023/03/20 17:14:52.229 +08:00] [ERROR] [client.go:547] ["[pd] tso request is canceled due to timeout"] [dc-location=global] [error="[PD:client:ErrClientGetTSOTimeout]get TSO timeout"]
[2023/03/20 17:15:02.277 +08:00] [ERROR] [region_cache.go:2272] ["loadStore from PD failed"] [id=4] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
[2023/03/20 17:15:16.685 +08:00] [ERROR] [error.go:321] ["encountered error"] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"] [stack="github.com/tikv/client-go/v2/error.Log\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/error/error.go:321\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/internal/locate/region_cache.go:490\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220913051514-ffaaf7131a8d/internal/locate/region_cache.go:458"]
[2023/03/20 17:15:16.684 +08:00] [ERROR] [client.go:850] ["[pd] getTS error"] [dc-location=global] [stream-addr=http://10.0.0.51:2379] [error="[PD:client:ErrClientGetTSO]rpc error: code = Canceled desc = context canceled: rpc error: code = Canceled desc = context canceled"]


| username: 考试没答案 | Original post link

The current cluster status is as follows, take a look.

| username: 考试没答案 | Original post link

Did you follow this document???

| username: 胡杨树旁 | Original post link

The status of the cluster after the upgrade shows that port 4002 is not open.

| username: 考试没答案 | Original post link

Please share your upgrade steps. It seems like you haven’t upgraded yet, right?