Region is unavailable

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Region is unavailable

| username: 等一分钟

Can the TiDB version be directly upgraded from 4.0.4 to 6.1? After that, it reports “Region is unavailable.” What could be the problem?

| username: 等一分钟 | Original post link

[2023/03/08 13:55:49.765 +08:00] [WARN] [endpoint.rs:621] [error-response] [err=“Region error (will back off and retry) message: "peer is not leader for region 1600789, leader may Some(id: 1600791 store_id: 411332)" not_leader { region_id: 1600789 leader { id: 1600791 store_id: 411332 } }”]
[2023/03/08 13:55:49.765 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1944033, leader may Some(id: 1944035 store_id: 411332)" not_leader { region_id: 1944033 leader { id: 1944035 store_id: 411332 } }))”] [cid=4120783]
[2023/03/08 13:55:49.765 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1944033, leader may Some(id: 1944035 store_id: 411332)" not_leader { region_id: 1944033 leader { id: 1944035 store_id: 411332 } }))”] [cid=4120782]
[2023/03/08 13:55:49.765 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1924577, leader may Some(id: 1924579 store_id: 411332)" not_leader { region_id: 1924577 leader { id: 1924579 store_id: 411332 } }))”] [cid=4120784]
[2023/03/08 13:55:49.765 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1924577, leader may Some(id: 1924579 store_id: 411332)" not_leader { region_id: 1924577 leader { id: 1924579 store_id: 411332 } }))”] [cid=4120785]
[2023/03/08 13:55:49.876 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1924577, leader may Some(id: 1924579 store_id: 411332)" not_leader { region_id: 1924577 leader { id: 1924579 store_id: 411332 } }))”] [cid=4120786]
[2023/03/08 13:55:49.876 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1924577, leader may Some(id: 1924579 store_id: 411332)" not_leader { region_id: 1924577 leader { id: 1924579 store_id: 411332 } }))”] [cid=4120787]
[2023/03/08 13:55:49.936 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1944033, leader may Some(id: 1944035 store_id: 411332)" not_leader { region_id: 1944033 leader { id: 1944035 store_id: 411332 } }))”] [cid=4120789]
[2023/03/08 13:55:49.985 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1903877, leader may Some(id: 1903879 store_id: 411332)" not_leader { region_id: 1903877 leader { id: 1903879 store_id: 411332 } }))”] [cid=4120793]

| username: 等一分钟 | Original post link

This error, “not leader,” will it automatically fix itself?

| username: h5n1 | Original post link

Was a similar post mentioned before? From TiDB’s perspective, the “not leader” error is normal because when trying to read data from TiKV through the region cache, the region leader has already migrated to another TiKV node, resulting in this error. TiDB will retry to the correct TiKV. For “region unavailable,” you can troubleshoot as follows:

  1. Use tiup cluster display to check for any abnormal TiKV.
  2. Use pd-ctl config show to check if the replica count setting max-replicas is >= 3.
  3. Check regions:
    (1) Regions without a leader:
    pd-ctl region --jq='.regions[]|select(has("leader")|not)|{id: .id,peer_stores: [.peers[].store_id]}'
    (2) Regions with fewer than a certain number of peers:
    pd-ctl region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length==1) }'
  4. Check the regions of tables or indexes:
    Use show table xx regions to get the error region ID.
    Use pd-ctl region xxx to check the region status.
  5. Refer to some troubleshooting scenarios in the official documentation:
    TiDB 集群问题导图 | PingCAP 文档中心
| username: 等一分钟 | Original post link

Thank you.

| username: 等一分钟 | Original post link

Some data cannot be queried now.

| username: h5n1 | Original post link

What data cannot be queried?

| username: 等一分钟 | Original post link

| username: 等一分钟 | Original post link

The program reports this error, and the data cannot be retrieved.

| username: 裤衩儿飞上天 | Original post link

Check the monitoring cluster to see if it is constantly in backoff or experiencing leader drops. If so, this needs to be resolved first.

| username: 等一分钟 | Original post link

[2023/03/08 15:47:27.453 +08:00] [WARN] [raft_client.rs:296] [“RPC batch_raft fail”] [err=“Some(RpcFailure(RpcStatus { status: 14-UNAVAILABLE, details: Some("failed to connect to all addresses") }))”] [sink_err=“Some(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some("failed to connect to all addresses") })))”] [to_addr=192.168.3.64:20170]

| username: h5n1 | Original post link

There are more problems than just this, right?

| username: 裤衩儿飞上天 | Original post link

Let’s check the network and server load situation.
First, let’s take a look at the overall monitoring to see what’s going on.

| username: 等一分钟 | Original post link

[ERROR] [transport.rs:163] [“send raft msg err”] [err=“Other("[src/server/raft_client.rs:208]: RaftClient send fail")”]

| username: 等一分钟 | Original post link

The network and load are both normal.

| username: h5n1 | Original post link

Please post the output of tiup cluster display.

| username: 等一分钟 | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: 等一分钟 | Original post link

Will this issue resolve itself automatically?

| username: 等一分钟 | Original post link

I can’t view images directly. Please provide the text you need translated.

| username: 等一分钟 | Original post link

Does this have anything to do with upgrading directly from v4 to v6?

Yesterday, there was another cluster that was upgraded from v5 to v6 without any issues.