NODE_disk_write_latency_more_than_16ms Frequently Alarms

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: NODE_disk_write_latency_more_than_16ms经常告警

| username: 心在飞翔

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]

After installing the TiDB cluster, even with no significant load, we often see the NODE_disk_write_latency_more_than_16ms alert. The servers are using Alibaba Cloud’s ESSD disks. I would like to ask if everyone in production clusters has disabled this alert or increased the alert threshold.

| username: Kongdom | Original post link

It looks like the ESSD disk is not an SSD disk. Is it inherently slow in read and write operations? Generally, this alert only appears when there is a problem with disk performance.

| username: onlyacat | Original post link

If it’s PD/TiDB, I think it’s okay. If it’s TiKV, it’s better to use the SSD inside the machine rather than this kind.

| username: 小龙虾爱大龙虾 | Original post link

The disk performance is too poor, 16 ms is not even as good as enterprise-level mechanical hard drives. :joy_cat:

| username: 有猫万事足 | Original post link

You often see this alert, which only indicates that the disk you are using is shared. And it just so happens that someone else is performing some large I/O operations, such as backup and restore. That’s why you receive this alert.

| username: FutureDB | Original post link

You can check the disk-related monitoring in Grafana to see if there are any large data write operations during the period of high disk write latency. If there is no significant business activity but the disk write latency is still high, then there might be an issue with the disk.

| username: zhanggame1 | Original post link

16 ms has nothing to do with SSDs; normal SSDs are <1ms.

| username: dba远航 | Original post link

If your SSD experiences significant delays under heavy read/write conditions (which can be observed during IO stress tests), it is recommended to consult your service provider to check for any abnormalities if this is a consistent issue with your system.

| username: Jack-li | Original post link

The read and write speed itself is probably low.

| username: Kongdom | Original post link

:joy: So, in that case, physical machine > physical virtual machine > cloud server, huh? This is the first time I’ve felt the instability of cloud servers so directly.