The QPS of TIKV shows 800M

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV 的QPS 显示有800M

| username: TiDBer_mCTc5877

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.5TIKV
[Reproduction Path] One read, multiple writes
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
We have a 3-node server, each server with two 3.5TIB SSDs.
Currently, our service is juicefs + TIKV, and we have 2.2 billion files on juicefs.
We found that the QPS monitored by TIKV reached 800M QPS/s or even higher, which doesn’t seem reasonable. Is there any issue here, and how can we fix it?
[Attachment: Screenshot/Log/Monitoring]

| username: TIDB-Learner | Original post link

Writing also involves first checking and then making modifications. Does this QPS value seem unusually high?

| username: TiDBer_mCTc5877 | Original post link

I also feel it’s a bit outrageous and don’t know how to troubleshoot. I’m a newbie here, do you have any pointers?

| username: TiDBer_mCTc5877 | Original post link

I can’t even count this QPS.

| username: 随缘天空 | Original post link

Is there a hotspot?

| username: zhh_912 | Original post link

Check which time period it is and compare it with the peak value of the next week.

| username: yytest | Original post link

It would be best to provide the underlying logs.

| username: 小于同学 | Original post link

Is it caused by a hotspot?

| username: zhaokede | Original post link

First, rule out business-related factors. Check if there is a business peak or if there have been any recent business system upgrades.

| username: TiDBer_mCTc5877 | Original post link

Not sure what a hotspot is. I am using it as a metadata service for JuiceFS, and the final exposure is a file interface. The TiKV layer cannot perceive it…

| username: TiDBer_mCTc5877 | Original post link

I see that from mid-April 2024 to mid-May 2024, it has been accumulating and increasing, unlike others with thousands of QPS.

| username: TiDBer_mCTc5877 | Original post link

The regular use of the business has been increasing since it went online in April, showing a cumulative trend, as if the values are being added up. However, some models will not behave this way.

| username: 有猫万事足 | Original post link

This graph doesn’t look like a Grafana graph. Who provided it?
Why not ask in the JuiceFS community? After all, it’s JuiceFS that’s using TiKV, right?

| username: 友利奈绪 | Original post link

Eliminate slow queries, as the accumulation of slow queries can affect overall concurrency and the performance of the entire cluster.

| username: zhaokede | Original post link

There is also such a possibility.

| username: TiDBer_mCTc5877 | Original post link

This is the data captured from Prometheus, just displayed in a different style. I want to check the QPS value from the TiKV layer to see if there is any display issue and determine from the TiKV layer whether it is abnormal.

| username: 托马斯滑板鞋 | Original post link

Could it be that the wrong metrics are being captured? Capturing cumulative values like CPU time or TiKV usage?

| username: TiDBer_QYr0vohO | Original post link

Can I see the detailed promsql?

| username: 有猫万事足 | Original post link

Please provide the calculation method. Some metrics are cumulative, and when displaying them, you need to subtract the previous value from the current value. Could it be a similar issue? Without details, it’s unclear where the problem lies.