How to Identify Latency Increase Caused by Resource Group?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何识别resource group导致的延迟增高?

| username: h5n1

When using sysbench with 32 threads for oltp_read_write testing, the duration significantly increases after applying this resource group. How can we identify if the resource group is causing the delay? Currently, resource control monitoring is still general, similar to throughput monitoring. The SQL execution time also lacks the time spent waiting for resource group scheduling. Should we add related information in the trace, similar to how Oracle has clear resmgr:cpu quantum resource group-related wait events?

| username: CabinfeverB | Original post link

In the Resource control panel, under the Client column, there is Successful KV Request Wait Duration.

| username: h5n1 | Original post link

The unit of this monitoring is seconds, right? The previous screenshots showed all zeros with no curve. Looking at the average, it shows 0.019. After I lowered the RU value to 100, this value became more noticeable. It feels like similar metrics should also be added to TiDB monitoring and SQL execution time statistics, so that issues can be identified from top to bottom.

| username: CabinfeverB | Original post link

We will change the unit later.
In the future, we will trace the waiting time here, but if this metric is displayed in terms of query dimensions, we need to clarify the definition and statistical logic.

| username: h5n1 | Original post link

What I mean has two aspects. 1. On the TiDB monitoring page, such as the KV request section or somewhere else, you can directly see these metrics to quickly identify issues caused by RU. 2. In the SQL execution statistics, it can also show the time consumption caused by RU. Or if you already have other better methods, that would be fine too.

| username: redgame | Original post link

It seems that currently on the TiDB monitoring page, there is indeed no direct display of latency information caused by resource groups.