Issues Related to Resource Control in Version 8.1.0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 8.1.0 资源管控相关问题

| username: GreenGuan

Hi, the system version is openEuler 22.03. When performing resource control evaluation, an error occurred. Currently, sysbench is being used for stress testing while performing resource control evaluation. Could you please help take a look? Also, I saw in the documentation that the absence of these two values can also cause this issue. How can I enable them?
resource_manager_resource_unit
process_cpu_usage

ERROR 1105 (HY000): The workload in selected time window is too low, with which TiDB is unable to reach a capacity estimation; please select another time window with higher workload, or calibrate resource by hardware instead

| username: WalterWj | Original post link

Try adding the East 8 time zone :thinking:
CALIBRATE RESOURCE START_TIME ‘2024-05-28 15:09:00 +08:00’ END_TIME ‘2024-05-28 15:16:00 +08:00’;

Otherwise, there might be an issue with the estimation…

The error suggests that the cluster pressure is very low, making it impossible to estimate.

| username: GreenGuan | Original post link

No

| username: WalterWj | Original post link

Was the sysbench test run with the root user as well?

| username: GreenGuan | Original post link

It is root.

| username: GreenGuan | Original post link

Background: During the sysbench stress test, evaluating the RU of TiDB 8.1.0, the stress test process applies different loads based on concurrency, with each stage lasting 1800s and a 120s break in between.

First stage (approximately): 14:30 ~ 15:00
Second stage (approximately): 15:00 ~ 15:30
Third stage (approximately): 15:30 ~ 16:00

Issue description: Unable to evaluate RU in the first stage with the following error

ERROR 1105 (HY000): The workload in selected time window is too low, with which TiDB is unable to reach a capacity estimation; please select another time window with higher workload, or calibrate resource by hardware instead

After communicating with the community support teacher, it seems that the standard for “low” is not very clear. Could the development team please take a look?

| username: li_zhenhuan | Original post link

Assuming each TiKV occupies 8vc of resources, with a total of 4 TiKVs, the cumulative CPU utilization of the 4 TiKV instances is 800%, which means 12vc is used. At this point, the overall TiKV utilization is:
800% / (8 * 4) = 0.25 > 0.2, so it can be evaluated at this time.
The same applies to TiDB. Within the time frame, the evaluation can only be successful if the resource utilization of either TiDB or TiKV exceeds 0.2.
In the first stage, it may not exceed 20% cumulatively.

| username: li_zhenhuan | Original post link

Additionally, when deploying multiple instances of TiKV on a single machine or multiple instances, you need to use NUMA or cgroup to limit the total CPU resources occupied by TiKV. Otherwise, a single TiKV instance might end up using the total CPU resources of the entire machine.

For example, if a server with 32 vCPUs deploys 4 TiKV instances, and there are 3 servers with a total of 12 TiKV instances, without setting NUMA or cgroup, the CPU utilization calculation is as follows:
3200% / (32 * 4 * 3) = 0.08 < 0.2. Although the resource utilization is high, it still does not exceed 0.2. Therefore, you need to use:

cgroup:
tikv:
resource_control:
memory_limit: 32G
cpu_quota: 800%

NUMA:
tikv:
numa_node: “0”

After setting these, you can see the total CPU usage per instance in the monitoring.