Miss-peer-region-count does not decrease

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: miss-peer-region-count不下降

| username: 张小凉1

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5
[Encountered Issue: Phenomenon and Impact]
The cluster suddenly has many miss-peer-region-count.
It was found that the number of regions on each node is balanced.
[Resource Configuration]
3 KV nodes, 3 PD nodes, 3 TiDB-server nodes
[Attachments: Screenshots/Logs/Monitoring]


Enterprise WeChat Screenshot_16805755331514

Please advise, experts.

| username: 裤衩儿飞上天 | Original post link

Check the status of each TiKV.

| username: 张小凉1 | Original post link

The status of the KV is all normal.

| username: 裤衩儿飞上天 | Original post link

  1. What does the cluster topology look like?
  2. Are there any disk errors?
  3. Is there any scheduling stuck?
  4. Are there any errors in the TiKV logs?
| username: 张小凉1 | Original post link

Cluster topology:

| username: 张小凉1 | Original post link

There should be no disk errors.
How can I check the scheduling information?

| username: 裤衩儿飞上天 | Original post link

  1. Grafana monitoring: pd-scheduler
  2. Check the OS logs of each TiKV node
  3. PD logs
  4. Overall network situation
  5. Grafana monitoring: pd-heartbeat
  6. tiup cluster display XXX
| username: 张小凉1 | Original post link

pd log:
[WARN] [util.go:163] [“apply request took too long”] [took=157.598105ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:"/topology/tidb/" range_end:"/topology/tidb0" "] [response=“range_response_count:6 size:1059”]

pd-heartbeat:

pd-scheduler:

tiup cluster display:

| username: h5n1 | Original post link

Use pd-ctl region check miss-peer to see which regions are affected. Then use pd-ctl region xx to check the status.

| username: 张小凉1 | Original post link

Two ways to view the same region, “pending_peers” are different. Is this the problem?

The number of miss-peer in config show is very large, not sure how to count the specific number.

| username: 张小凉1 | Original post link

In addition to miss-peer, there are also many empty-region and undersized-region.

| username: h5n1 | Original post link

Check if the disk and CPU of TiKV are busy.

| username: 张小凉1 | Original post link

Low utilization

| username: h5n1 | Original post link

Check the TiKV detail – raft propose – apply monitoring latency.

| username: 张小凉1 | Original post link

The image you provided is not accessible. Please provide the text content that needs to be translated.

| username: TiDBer_pkQ5q1l0 | Original post link

Is there a timeout error reported for PD scheduling?

| username: 裤衩儿飞上天 | Original post link

  1. Please upload the log information of PD and TiKV, let’s troubleshoot.
  2. When providing monitoring data, it’s best to provide it for the same time period as the issue.
| username: h5n1 | Original post link

Can you take a screenshot of the miss-region monitoring, from before the issue occurred to now?