Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 收集社区(业界)TiDB-7 资源管控使用场景
[TiDB Usage Environment] Production Environment
Collect community (company) TiDB-7 resource control usage scenarios, collect industry (community) resource control implementation methods, and solve which problems.
In our company from top to bottom:
- Currently, business isolation is done at the tidb-server level, with non-core business and core business allocated to different tidb-servers; no resource isolation at the tidb level and no tikv priority segmentation.
Sure, I’ll write one for you.
You can take a look at these two articles first, although they are not practical:
For some small financial companies, their purchased systems are often numerous, but the corresponding databases for these systems are very small, generally tens of gigabytes or hundreds of gigabytes, almost never reaching terabytes. According to previous deployment plans, each application corresponds to a set of primary and standby databases. These databases themselves have very low performance pressure, and the demand for disk space does not grow dramatically. TiDB itself is more suitable for large databases, and its ideal deployment method involves 11 servers (3 TiKV + 2 TiDB + 3 PD + 2 TiFlash + TiCDC, etc.). In this case, using a standard TiDB for a single application is really too wasteful. If resource management is done well, then one TiDB can correspond to 10 or even more original MySQL databases. From the server perspective, this reduces the original at least 20 servers to 11, and additionally, the management cost for DBAs is greatly reduced. Originally managing 10 sets of databases with 20 primary and standby instances, now only one TiDB needs to be managed. The key to replacing 10 small MySQL instances with one TiDB here is resource management. As long as resource management is done well, the leadership will not complain about a system performance issue causing the entire system to fail.
Got it, I’ve read through these two, and they provide a lot of explanations. However, some details are still not covered, such as monitoring alerts, operational methods, capacity calculation formulas, etc.
Yes, still waiting for the experts’ practical cases~
This also needs attention:
- Enhanced observability related to resource control #49318 @glorv @bufferflies @nolouch As more and more users use resource groups to isolate business applications, resource control provides richer data based on resource groups to help you observe resource group load and settings, ensuring that issues can be quickly identified and accurately diagnosed. This includes:
- Slow query logs now include resource group names, RU consumption, and resource wait time.
- Statement Summary Tables now include resource group names, RU consumption, and resource wait time.
- The variable
tidb_last_query_info
now includes SQL RU consumption information ru_consumption
, allowing you to obtain the resource consumption of the last statement in the session.
- Added database metrics based on resource groups: QPS/TPS, execution time (P999/P99/P95), failure counts, and connection counts.
- Added system table
request_unit_by_group
to record the historical resource consumption of resource groups daily. For more information, refer to Slow Query Logs, Statement Summary Tables, and Resource Control Monitoring Metrics Detailed Explanation.
Getting better and better. I regret upgrading so early. Our company’s version is still 7.1.0. 
+1 It’s already quite new.
By the way, the latest update is 7.1.3, you might want to check it out.
Yes, this is mostly a testing feature.
For company practice, additional information is needed.