How does TiDB implement flow control?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 请问TIDB如何做流控

| username: TiDBer_Mb0v3ad7

【TiDB Usage Environment】Production, Testing, Research
【TiDB Version】v5.3.0
【Encountered Problem】High traffic in a short period on TiDB, needs optimization
【Reproduction Path】What operations were performed that caused the problem
【Problem Phenomenon and Impact】



CPU spikes to 100%, network also reaches 2Gbps; there is a significant risk; how can we optimize this? For example, rate limiting.

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: ealam_小羽 | Original post link

  1. You can refer to the performance tuning manual

    TiKV 线程池性能调优 | PingCAP 文档中心
  2. This looks like it might be caused by a scheduled script, consider whether some restrictions can also be synchronized on the business side.
  3. If the traffic is high, you can check the Dashboard to see if it is caused by reading a large amount of data. It may not necessarily be a large number of queries; it could be some queries retrieving a large amount of data at once, which might result in slow queries. You can consider limiting queries that exceed a certain number of seconds.
| username: ddhe9527 | Original post link

  1. On the TiDB instance side, you can control concurrency through token-limit.

  2. Optimize SQL to push operators down to the store as much as possible to reduce network traffic.

  3. Expand the number of TiDB instances and set up load balancing (LB) properly.

| username: TiDBer_Mb0v3ad7 | Original post link

May I ask if this token-limit is reflected in the monitoring? I will check if it also surged during the abnormal time period.

| username: TiDBer_Mb0v3ad7 | Original post link

The data is from the TiKV node to TiDB. However, this data is not directly returned to the business service, as the traffic between the business service and TiDB is not particularly high.

| username: ddhe9527 | Original post link

Without a direct view, you can check the total number of connections (active + inactive) through the Connection Count in tidb->Server. You can also estimate the number of concurrent active sessions using QPS*Duration/1000ms.

| username: ealam_小羽 | Original post link

That sounds like it might be caused by slow queries. Some data might have missed the index, causing TiDB to fetch a lot of data from TiKV and then aggregate it at the Server layer. You can check the slow queries and SQL resource consumption in TiDB Dashboard during that period.

| username: cs58_dba | Original post link

Well, it is generally more likely to be a full table scan.

| username: tidb狂热爱好者 | Original post link

  1. On the TiDB instance side, you can control concurrency through token-limit.

  2. Optimize SQL to push operators down to the store as much as possible to reduce network traffic.

  3. Increase the number of TiDB instances and set up load balancing.

Actually, the main reason is the presence of SQL with very high total CPU usage. Just identify and reduce the high CPU usage SQL on the dashboard to resolve the issue.

| username: Raymond | Original post link

  1. The scheduler can also be throttled.


    TiKV 配置文件描述 | PingCAP 文档中心

  2. RocksDB can also be throttled.

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.