What happens when TiDB system monitoring IO is full and how can it be analyzed from several aspects?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 系统监控io满了是怎么回事呢可以从几个方面分析呢

| username: zqk_zqk

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 6.1
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issues: Issue Symptoms and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

| username: 我是咖啡哥 | Original post link

What you mentioned about IO being full refers to IO utilization, right? It’s not scary if IO is full; you need to look at other metrics as well. If the latency is not high and execution is fast, it means the resources are being fully utilized.

| username: Kongdom | Original post link

Check if there are any slow queries.

| username: zqk_zqk | Original post link

It seems that the IO metrics of TiKV are maxed out, causing the overall speed of executing inserts to be very slow.

| username: Jiawei | Original post link

Here is a thought: consider where TiDB will use IO, and then check those points accordingly. Of course, first consider whether there are slow queries or large queries causing the issue, and then consider other factors.

| username: zqk_zqk | Original post link

If there are slow queries, is optimizing the SQL the only option? Or are there any tuning methods for TiDB?

| username: zqk_zqk | Original post link

If there are slow queries, should I optimize the SQL or are there performance parameters in TiDB that can be optimized? I’m a newbie and have just started with TiDB.

| username: Kongdom | Original post link

Start from multiple aspects and investigate one by one

| username: zqk_zqk | Original post link

Do you have any optimization materials?

| username: Kongdom | Original post link

You can refer to the official documentation for performance tuning

| username: tidb狂热爱好者 | Original post link

You should first optimize the slowest SQL queries. The dashboard has a CPU usage ranking. Optimize them one by one according to the usage. Alternatively, you can refuse to execute long-running SQL queries to ensure cluster stability, such as limiting the execution of SQL queries that take more than 60 seconds.