TiDB Server Fails to Start from Version 7.1 to 7.5

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 从7.1到7.5 tidb server 启动不了

| username: TiDBer_oDmMGRXY

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]


[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]
Followed the instructions in this document, created two tables but still not working

| username: 像风一样的男子 | Original post link

Can you start from the beginning and explain what you did and what problems you encountered?

| username: TiDBer_oDmMGRXY | Original post link

tiup cluster upgrade tidb-test v7.5.0
There was an issue during the rolling upgrade of TiDB, specifically with the TiDB server. It was interrupted and couldn’t be started.

| username: tidb狂热爱好者 | Original post link

You’re missing the table. Please post the TiDB logs.

| username: 像风一样的男子 | Original post link

Please post the TiDB failure log located at tidb-3306/log.

| username: tidb狂热爱好者 | Original post link

@pyuh-Beijing Check if there are any errors reported on other TiDB nodes. Use admin show ddl jobs to see which TiDB node is the owner, and then check if the owner has reported any errors.

| username: TiDBer_oDmMGRXY | Original post link

Filter ERROR
puacct.usage_sys: open /sys/fs/cgroup/cpu,cpuacct/system.slice/tidb-3306.service/cpuacct.usage_sys: no such file or directory"]
[2023/12/26 02:42:46.254 +08:00] [ERROR] [cpu.go:65] [GetCgroupCPU] [error=“error when reading cpu system time from cgroup v1 at /sys/fs/cgroup/cpu,cpuacct/system.slice/tidb-3306.service/cpuacct.usage_sys: open /sys/fs/cgroup/cpu,cpuacct/system.slice/tidb-3306.service/cpuacct.usage_sys: no such file or directory”]
[2023/12/26 02:44:04.261 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]
[2023/12/26 02:44:04.261 +08:00] [ERROR] [pd.go:236] [“updateTS error”] [txnScope=global] [error=“rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster”]
[2023/12/26 02:44:04.364 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.1.148.245:2379,http://10.1.148.246:2379,http://10.1.148.248:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2023/12/26 02:44:04.364 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]
[2023/12/26 02:44:04.565 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.1.148.245:2379,http://10.1.148.246:2379,http://10.1.148.248:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2023/12/26 02:44:04.566 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]
[2023/12/26 02:44:04.967 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.1.148.245:2379,http://10.1.148.246:2379,http://10.1.148.248:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2023/12/26 02:44:04.967 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]
[2023/12/26 02:44:05.769 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.1.148.245:2379,http://10.1.148.246:2379,http://10.1.148.248:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2023/12/26 02:44:05.769 +08:00] [ERROR] [tso_dispatcher.go:493] [“[tso] getTS error”] [dc-location=global] [stream-addr=http://10.1.148.248:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, after processing requests”]

| username: TiDBer_oDmMGRXY | Original post link

The images you provided are not accessible. Please provide the text content for translation.

| username: li_zhenhuan | Original post link

By using admin show ddl jobs;, you can see that there are DDLs continuously being cancelled.
Using admin show ddl;, you can locate the DDL owner node and find that the TiDB owner node reports an error.
It is recommended to manually create the directory and then restart.

| username: li_zhenhuan | Original post link

The main reason is that several DDL SQL statements need to be executed during the upgrade process. The upgrade fails because the DDL Owner directory does not exist, preventing the DDL from creating the directory and executing successfully. By fixing the DDL Owner node issue, the DDL can be executed smoothly, allowing the upgrade to be completed.

| username: 江湖故人 | Original post link

Can we first use one 7.5 to support it, and then upgrade the remaining three 7.1 by scaling down and scaling up?

| username: 江湖故人 | Original post link

Where did your log images come from, not enough disk space?

| username: zhanggame1 | Original post link

This is an old issue with /tmp/tidb/ not having the necessary permissions. You need to manually create it and grant read and write permissions to the tidb user.

| username: dba远航 | Original post link

It feels like there is an issue with the connection to PD.