Discussion on Which Resources Are Controlled by Resource Management

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 资源管控管控哪些资源讨论

| username: TiDBer_yyy

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.1.0
Question:
Is TiDB resource control implemented at the TiDB layer or the TiKV layer (CPU, IOPS, network IO)?

The official documentation states that resource control is done at the TiDB layer, while the TiKV layer schedules requests based on quotas. However, upon reviewing the code, it appears that both TiDB and TiKV CPU resources are calculated. Which one takes precedence?

Official description (使用资源管控 (Resource Control) 实现资源隔离 | PingCAP 文档中心)
The TiDB layer throttles user read and write requests based on the quotas set by the resource group the user is bound to. The TiKV layer schedules requests based on the priority mapped from the quotas. Through these two layers of control, resource isolation can be achieved to meet Quality of Service (QoS) requirements.

Upon reviewing the code:
All relevant CPU resources for both TiDB and TiKV layers are calculated.

Reference to historical discussion:

Discussion on TiDB Resource Control Total Limit

| username: tidb狂热爱好者 | Original post link

In fact, we had already started resource control work before version 5.4 introduced resource control.
We divided the business into TP and AP.
AP tasks were scheduled for midnight, with SQL directed to a separate TiDB instance that only executed AP tasks and only accessed TiFlash.
This way, TiKV resources were always sufficient.
We grouped different business units to use different TiDB frontends, so if a particular TiDB instance had an issue, only the corresponding business unit would be affected.
In 5 years, we haven’t encountered any problems.

| username: wangccsy | Original post link

Got it.

| username: dba远航 | Original post link

Yes, not bad.

| username: TiDBer_小阿飞 | Original post link

| username: kelvin | Original post link

Got it, learned something new.

| username: Billmay表妹 | Original post link

TiDB resource control involves resource management at both the TiDB and TiKV levels. Specifically, TiDB is responsible for managing CPU and network IO resources, while TiKV is responsible for managing disk IO (including IOPS) resources.

At the TiDB level, resource control is mainly achieved through the configuration of Resource Groups. A Resource Group is a concept in TiDB that binds sessions to user groups and defines resource quotas using Resource Units (RU) to manage CPU and network IO resources. You can enable resource control in the TiDB configuration and create and configure Resource Groups based on actual needs to achieve resource management at the TiDB level.

At the TiKV level, resource control is primarily achieved through scheduling control. By enabling scheduling control in the TiKV configuration, TiKV can manage disk IO resources, including limiting IOPS (input/output operations per second). This ensures that TiKV does not become overloaded when processing data, thereby maintaining the stability and performance of the entire system.

In summary, TiDB resource control involves managing resources at both the TiDB and TiKV levels, specifically controlling CPU, network IO, and disk IO (including IOPS). This effectively manages and optimizes the use of system resources, improving system performance and stability.

| username: 春风十里 | Original post link

Scheduling Backend Tasks (Addindex, Importinto Tasks) to Specific TiDB Nodes

Starting from TiDB 7.2, a distributed framework was introduced. The goal of this framework is to achieve unified scheduling and distributed execution of all backend tasks, and to provide unified resource management capabilities for the backend tasks it integrates. The distributed framework supports backend tasks (specifically Add index and Import into tasks) to be executed on all TiDB nodes in the TiDB cluster, thereby improving the performance of such tasks. TiDB 7.5 allows DBAs to schedule resource-intensive backend tasks like Add index and Import into to specific TiDB nodes, thereby isolating the load from existing TiDB nodes and avoiding impact on business operations. When the tidb_service_scope is set to background on the node where you want to run backend tasks, the distributed framework will schedule that node to execute backend tasks. If not set, the node will not be used for executing backend tasks.

The real breakthrough of this improvement lies in the ability to dynamically add TiDB nodes to handle such sudden backend tasks. If you need to import a large table, you can simply add several TiDB nodes to the cluster to complete the task without putting any additional pressure on the existing TiDB nodes. The same applies to adding indexes. After the task is completed, these nodes can be removed. This feature provides a more seamless way to handle large tasks (Add index, Import large amounts of data) on production clusters.

| username: TiDBer_yyy | Original post link

Standard!

Is there any documentation for this?

Further questions:

  1. What standard does the TiKV layer use to limit IOPS? Is it consistent with the RU at the TiDB layer?
  2. Does the RU calculation include resources in TiKV? The code seems to calculate the CPU at the TiKV layer.
| username: TiDBer_yyy | Original post link

This is still very important. Previously, I encountered a situation where DM synchronized DDL, and the owner executed offline statistics slow queries, causing DM synchronization to be blocked by the slow queries for a long time.

Off-topic: It would be better if we could manually schedule the owner.

| username: TiDBer_yyy | Original post link

We are currently experiencing the same situation. However, when running statistics for operations, it causes issues by blocking online business. The analysis suggests that slow queries might be blocking TiKV’s IO.

| username: 春风十里 | Original post link

[Column - TiDB 7.5 LTS Release丨Enhancing Stability and Cost Flexibility for Key Applications in Large-Scale Scenarios | TiDB Community](专栏 - TiDB 7.5 LTS 发版丨提升规模化场景下关键应用的稳定性和成本的灵活性 | TiDB 社区

| username: 春风十里 | Original post link

To be honest, I haven’t tested it either. I saw the explanation in the official documentation and will test it when I have time. Overall, I think resource control is a very good feature, especially for small businesses. A standard TiDB production environment deployment requires 13 machines, and using these machines for just one system is really wasteful. Especially for those original MySQL databases, which are hundreds of gigabytes, a single TiDB can replace multiple MySQL databases. With resource control, I can confidently replace multiple MySQL databases with one TiDB. Otherwise, I would be worried that if any database acts up and crashes the entire cluster, it would be a big problem.

| username: TiDBer_yyy | Original post link

Understood, the above reply inquired about “biaomei,” and there was some IOPS flow control done at the TiKV level. The official documentation is very vague and doesn’t mention this part.

You mentioned this, it should be after version 7.4. Does this apply to version 7.1.0?

| username: 春风十里 | Original post link

I posted the one supported by 7.5, 7.1 doesn’t work, you might consider upgrading.

| username: zhaokede | Original post link

Does version 4.x support it? I’ve been hesitant to upgrade.

| username: 春风十里 | Original post link

Resource control is only available in version 7.1, version 4 definitely does not support it.