The QPS of TIKV shows 800M

username: TiDBer_mCTc5877

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.5TIKV
[Reproduction Path] One read, multiple writes
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
We have a 3-node server, each server with two 3.5TIB SSDs.
Currently, our service is juicefs + TIKV, and we have 2.2 billion files on juicefs.
We found that the QPS monitored by TIKV reached 800M QPS/s or even higher, which doesn’t seem reasonable. Is there any issue here, and how can we fix it?
username: TIDB-Learner

Writing also involves first checking and then making modifications. Does this QPS value seem unusually high?

username: TiDBer_mCTc5877

I also feel it’s a bit outrageous and don’t know how to troubleshoot. I’m a newbie here, do you have any pointers?

username: TiDBer_mCTc5877

I can’t even count this QPS.

username: 随缘天空

Is there a hotspot?

username: zhh_912

Check which time period it is and compare it with the peak value of the next week.

username: yytest

It would be best to provide the underlying logs.

username: 小于同学

Is it caused by a hotspot?

username: zhaokede

First, rule out business-related factors. Check if there is a business peak or if there have been any recent business system upgrades.

username: TiDBer_mCTc5877

Not sure what a hotspot is. I am using it as a metadata service for JuiceFS, and the final exposure is a file interface. The TiKV layer cannot perceive it…

username: TiDBer_mCTc5877

I see that from mid-April 2024 to mid-May 2024, it has been accumulating and increasing, unlike others with thousands of QPS.

username: TiDBer_mCTc5877

The regular use of the business has been increasing since it went online in April, showing a cumulative trend, as if the values are being added up. However, some models will not behave this way.

username: 有猫万事足

This graph doesn’t look like a Grafana graph. Who provided it?
Why not ask in the JuiceFS community? After all, it’s JuiceFS that’s using TiKV, right?

username: 友利奈绪

Eliminate slow queries, as the accumulation of slow queries can affect overall concurrency and the performance of the entire cluster.

username: zhaokede

There is also such a possibility.

username: TiDBer_mCTc5877

This is the data captured from Prometheus, just displayed in a different style. I want to check the QPS value from the TiKV layer to see if there is any display issue and determine from the TiKV layer whether it is abnormal.

username: 托马斯滑板鞋

Could it be that the wrong metrics are being captured? Capturing cumulative values like CPU time or TiKV usage?

username: TiDBer_QYr0vohO

Can I see the detailed promsql?

username: 有猫万事足

Please provide the calculation method. Some metrics are cumulative, and when displaying them, you need to subtract the previous value from the current value. Could it be a similar issue? Without details, it’s unclear where the problem lies.