Is TiKV continuously consuming a large amount of read IO? Mbps?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiKV持续占用大量读IO?Mbps?

| username: Zealot

【TiDB Version】7.1.2
Starting from Monday morning, the IO for TiKV reads has been consistently high, even during idle nights. How should I troubleshoot this?

Update: So does this Mbps not mean IO?

The read Mbps has been consistently high, but the IO usage rate hasn’t been increasing continuously.

From the traffic visualization, no continuous data reads were found.

The Unified read pool CPU usage has been consistently increasing in line with the Mbps.

| username: xfworld | Original post link

  • Check if there were any applications or services querying information through TiDB SQL during that period.
  • Check if the data analyze scheduling was automatically triggered.
| username: Zealot | Original post link

The service is used during the day, but it is impossible for it to remain so high at night. I also compared the data from last week, and it is indeed abnormally high. I have also stopped all tasks before, but the read IO is still very high.

| username: xfworld | Original post link

What is the difference between QPS and Statement OPS on the monitoring?

QPS counts all executed SQL commands, including use database, load data, begin, commit, set, show, insert, select, etc.

Statement OPS only counts business-related commands such as select, update, insert, etc., so the statistics of Statement OPS are more consistent with the business.


Check this as well.

| username: Zealot | Original post link

There doesn’t seem to be much change in QPS this week. I couldn’t find the Statement OPS you mentioned.

| username: 像风一样的男子 | Original post link

Check if the region balance is consistently being maintained in Grafana monitoring.

| username: 江湖故人 | Original post link

If balanced, both read and write should increase, right?

| username: 江湖故人 | Original post link

Check if topsql has performance degradation.

| username: 小龙虾爱大龙虾 | Original post link

You can check the TiKV-Details => IO Breakdown panel in the monitoring system, which contains detailed read and write IO classifications. The overall total might still differ somewhat from what you see on the disk, but it provides a certain level of reference value.

| username: Zealot | Original post link

The performance degradation is visible to the naked eye :sob:

| username: Zealot | Original post link

The data from the past two days looks normal.

| username: 江湖故人 | Original post link

Then optimize these SQLs and see if the issue is due to an abnormal SQL execution plan. Check the statistics of the topsql-related tables and whether the analyze operation was successful. :thinking:

| username: Zealot | Original post link

It should not be an SQL issue. There is a lot of idle time at night, and I have also closed the SQL entry, but there is no significant change in Mbps. However, the change in CPU is still noticeable. After closing the SQL entry, the CPU usage has significantly decreased.

| username: 小龙虾爱大龙虾 | Original post link

Go check the disk monitoring, is there really that much read volume? Is it possible that the MBps on some TiKV panel is different from what we think?

| username: Zealot | Original post link

Are you saying to check the actual disk read/write volume on the Linux machine of the TiKV node?

| username: 小龙虾爱大龙虾 | Original post link

There are monitoring panels such as the overview panel, node exporter panel, and disk performance panel.

| username: Zealot | Original post link

Is this it?
[root@tidb-0010 ~]# iostat
Linux 3.10.0-1160.71.1.el7.x86_64 (tidb-0010) 12/19/2023 x86_64 (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
17.77 0.00 1.49 0.12 0.00 80.63

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
nvme0n1 655.14 1765.72 18559.24 81744347529 859205994992

| username: 江湖故人 | Original post link

Is the TiKV network card traffic and read IO trend consistent?
To determine if there is an internal database anomaly, you can disconnect the SQL entry and see if the TiKV network card traffic decreases.

| username: Zealot | Original post link

This one? It seems that the IO is indeed not high.

| username: Zealot | Original post link

How do you check the network card traffic? This is the ECS monitoring.