Resources are sufficient, but there are some warnings in TiKV. Seeking advice from experts on how to optimize performance for tidb_tikvclient_backoff_seconds_count region_miss

translator_bot · June 22, 2024, 1:35pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 资源充足，但是tikv有一些warn报警，请教大佬如何优化性能 tidb_tikvclient_backoff_seconds_count region_miss

| username: TI表弟

【TiDB Usage Environment】Production Environment
【TiDB Version】v6.5.0
【Reproduction Path】Operations performed that led to the issue
【Encountered Issue: Phenomenon and Impact】
【Resource Configuration】
【Attachments: Screenshots/Logs/Monitoring】

Node warn logs:
tikv:
[2023/03/20 19:30:25.553 +08:00] [WARN] [subscription_track.rs:143] [“trying to deregister region not registered”] [region_id=1391586]
[2023/03/20 19:32:01.364 +08:00] [WARN] [endpoint.rs:780] [error-response] [err=“Region error (will back off and retry) message: "region 1298525 is missing" region_not_found { region_id: 1298525 }”]
[2023/03/20 19:32:46.285 +08:00] [WARN] [endpoint.rs:780] [error-response] [err=“Region error (will back off and retry) message: "peer is not leader for region 1300428, leader may Some(id: 1393359 store_id: 101059)" not_leader { region_id: 1300428 leader { id: 1393359 store_id: 101059 } }”]

translator_bot · June 22, 2024, 1:35pm

| username: wzf0072 | Original post link

No alert information provided.

translator_bot · June 22, 2024, 1:35pm

| username: TI表弟 | Original post link

The image you provided is not accessible. Please provide the text you need translated.

translator_bot · June 22, 2024, 1:35pm

| username: TI表弟 | Original post link

tidb_tikvclient_backoff_seconds_count region miss

translator_bot · June 22, 2024, 1:35pm

| username: wzf0072 | Original post link

The hardware does not meet the minimum requirements for a production environment. Can you upgrade the configuration first?

translator_bot · June 22, 2024, 1:35pm

| username: TI表弟 | Original post link

I have plenty of resources left, far from being a bottleneck.

translator_bot · June 22, 2024, 1:35pm

| username: wzf0072 | Original post link

Looking at this, it seems to have no impact on the business.

translator_bot · June 22, 2024, 1:35pm

| username: TI表弟 | Original post link

Yes, otherwise there will be warning alarms, which is uncomfortable and makes one feel that once the business writes more data, there will be various problems.

translator_bot · June 22, 2024, 1:35pm

| username: tidb狂热爱好者 | Original post link

If you can understand this diagram, it means you really get it.

translator_bot · June 22, 2024, 1:35pm

| username: tidb狂热爱好者 | Original post link

I told you what it is: the read and write operations have exceeded the capacity of your cluster.

translator_bot · June 22, 2024, 1:35pm

| username: wzf0072 | Original post link

Could you please explain it?

translator_bot · June 22, 2024, 1:35pm

| username: szza | Original post link

The warning is likely caused by outdated information carried by requests when a region undergoes a split or a leader switch.

translator_bot · June 22, 2024, 1:35pm

| username: Jellybean | Original post link

Usually, these warning messages have no impact and are part of the normal internal region scheduling process, which can generally be handled by the cluster itself. If you are concerned, you can monitor the cluster’s QPS, latency, and other metrics.

translator_bot · June 22, 2024, 1:35pm

| username: tidb狂热爱好者 | Original post link

If this error is due to read I/O limitations, it will recover on its own after some time. You don’t need to handle it; it will resolve itself.

translator_bot · June 22, 2024, 1:35pm

| username: 考试没答案 | Original post link

In the past, I also frequently encountered this error.

Did you have any frequent business operations when the error occurred?

translator_bot · June 22, 2024, 1:35pm

| username: 考试没答案 | Original post link

Also, are there large tables, such as those with billions of rows, that are frequently subjected to DML operations? Check the statistics to see if the table has been analyzed.

translator_bot · June 22, 2024, 1:35pm

| username: TI表弟 | Original post link

It’s not a very large amount, right? A partitioned table with over 2 billion rows, and more than a hundred inserts per second.

translator_bot · June 22, 2024, 1:35pm

| username: 考试没答案 | Original post link

Analyze the configuration below??? Is it caused by analyze?

translator_bot · June 22, 2024, 1:35pm

| username: TI表弟 | Original post link

Analyze

translator_bot · June 22, 2024, 1:35pm

| username: 考试没答案 | Original post link

select * from information_schema.analyze_status;
select * from mysql.analyze_jobs;