Prometheus

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Prometheus

| username: 普罗米修斯

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.2.4
[Encountered Problem: Phenomenon and Impact]
In the Dashboard, there are many alerts with alertname=“NODE_disk_write_latency_more_than_16ms” indicating disk latency warnings. These disks are mechanical disks for TiDB PD. Such alerts can be silenced and ignored, right? It’s quite annoying.

| username: MrSylar | Original post link

It is not recommended to ignore this. The PD’s disk operation is to refresh the transaction ID. If the disk operation is slow, the speed of allocating transaction IDs to compute nodes will be slow, which in turn will slow down the transaction response time. Therefore, it is recommended to use an SSD for PD.

| username: 小龙虾爱大龙虾 | Original post link

For the testing environment, you can ignore it directly. For the production environment, it is recommended to increase the configuration and follow the official recommended configuration.

| username: 普罗米修斯 | Original post link

When there are about 10 alarms, the system disk on the TiKV node and the dm-0 alarm, this is not the deployment path of the TiKV node.

| username: 像风一样的男子 | Original post link

I have ignored it. Warning-level alerts are too frequent and can be ignored.

| username: zhanggame1 | Original post link

Where do you enable Alertmanager settings? I haven’t figured it out.

| username: 普罗米修斯 | Original post link

| username: 普罗米修斯 | Original post link

Okay, I’ll try turning off these warnings first and see.

| username: 普罗米修斯 | Original post link

TiDB and PD are SAS. Later, I will apply to the leadership to replace the drive letter of PD.

| username: 像风一样的男子 | Original post link

http://alertmanager server IP:9093/#/alerts

| username: zhanggame1 | Original post link

Okay, thank you.

| username: dba远航 | Original post link

You can filter it out first, but if it’s in production, you need to face it seriously, indicating that IO has already encountered a bottleneck.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.