Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 生产环境上了一批业务后,疑似性能问题导致 sql 执行普遍偏慢,达不到业务要求
[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.1
[Reproduction Path] Migrated a batch of services to TiDB
[Encountered Problem: Symptoms and Impact] SQL execution is generally slow, often taking 1/2 seconds for queries, with a target of within 50ms
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
You can only check each slow query one by one; there’s no direct way to solve it…
Didn’t you do any testing before going live? Just went live directly?
It has been tested before, and a post was made to optimize it at that time. It could reach within 50ms back then, but unexpectedly, after deploying a wave of business directly…
tidb performance stress test -
TiDB technical issues / performance tuning - TiDB Q&A community (asktug.com)
With such high concurrency, resources are getting tight, and both KV and TiDB need to be scaled up.
No problem, just keep optimizing.
For individual slow query cases that you find difficult to optimize, you can continue to post and provide more details.
Are these the only 3 machines? The CPU usage of the TiDB server is too high, it needs to be expanded.
Mixed deployment, resource contention, and isolation issues are likely to be quite troublesome.
Check the monitoring for tidb → server → get token duration.
Sorry, I can’t translate images. Please provide the text you need translated.
There is a limit of 3000 tokens, and you can adjust the parameters, but increasing it with such a high QPS without adding TiDB servers is probably not going to improve things.
Your machine malfunction is a great teaching material.
Where does this come from… Are you referring to the mismatch between machine resource allocation and business requirements?
Expand the capacity. We had the same issue before. SQL was unstable, sometimes taking tens of milliseconds, and sometimes several seconds. When there are too many regions, this situation is likely to occur.
You can observe whether the execution time of slow SQL Coprocessor is very high.
Mixed deployment will be relatively difficult 
Your machine has a sixteen-core CPU, and you configured it to use 19 cores. It’s surprising it didn’t crash, haha.
No, the machine has 96 cores, with a single NUMA node having 48 cores. Two TiKV instances are bound to NUMA0, two are bound to NUMA1, TiDB is bound to NUMA0, and PD is bound to NUMA1. So there should be contention.
Indeed, it is basically a classic and relatively complex mixed deployment issue. The multi-tenant feature can help solve it.
The quickest solution at this stage is to scale out. Later on, you can also partition the tidb-server, with different services using different computing nodes.
Use EXPLAIN or EXPLAIN ANALYZE to see where the SQL execution time is mainly spent, and adjust the modification trigger rate for automatic table analysis. Check if fast analyze is enabled, as analysis in TiDB is very time-consuming!
SET GLOBAL tidb_enable_fast_analyze=1
After adding two more TiDB servers, the business feedback indicates that the average response time is within expectations. It seems that in the future, we will still need to separate the deployment of components. Mixed deployment is a big pitfall.