Help Needed: Locating Abnormal Traffic in TiKV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 求助:tikv异常流量定位

| username: porpoiselxj

[Test Environment for TiDB] Testing
[TiDB Version] V6.1.1
[Issue Encountered] On Grafana, the traffic monitoring for TiKV shows a traffic surge every 10 minutes. Only one TiKV instance exhibits this phenomenon. Could you please advise on how to identify the cause of the traffic surge? (The traffic visualization interface on the dashboard does not show any significant reads or writes)

| username: zhanggame1 | Original post link

Is the ten minutes caused by the default GC? Check the GC configuration.

| username: porpoiselxj | Original post link

GC does indeed occur every 10 minutes, but why does this situation only happen with one TiKV instance?

| username: zhanggame1 | Original post link

First, change the default GC time and see if it’s caused by GC, to rule it out.

| username: porpoiselxj | Original post link

Okay, let me give it a try.

| username: porpoiselxj | Original post link

It doesn’t seem to be related to GC. I changed the GC to run every 15 minutes, but the traffic peak still occurs every 10 minutes. I checked the time points before the modification, and they don’t match the GC time points either. The GC always starts at 05 minutes past the hour, while the traffic appears every 10 minutes.

| username: linnana | Original post link

Monitor the disk IO load situation for a period of time.

| username: porpoiselxj | Original post link

The data disk IO of the TiKV nodes is always full, with several nodes at 99.9%.

| username: redgame | Original post link

That’s IO…

| username: porpoiselxj | Original post link

This has nothing to do with IO, I was talking about traffic.

| username: Anna | Original post link

Yes, it matters. Disk read and write operations can affect various issues.

| username: Anna | Original post link

The impact of disk I/O on traffic is mainly reflected in the bandwidth and resources occupied by disk read and write operations. When disk I/O is relatively busy, the time required to read or write data from the hard disk will be longer, leading to longer request processing times, thereby affecting the overall performance of the system and network throughput.

In scenarios where network traffic accesses the disk, such as through cloud storage services, slow disk I/O may result in slower download or upload speeds, thus affecting the user experience.

To avoid the impact of disk I/O on system performance, a series of optimization measures can be adopted, such as using high-speed disks, increasing memory cache, and reducing the number of disk I/O operations. Additionally, reasonably allocating the priority of disk read and write operations and prioritizing important tasks can ensure system performance and smoothness.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.