The cluster shows "server is busy" status approximately every 30 minutes

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 集群每隔大概30分钟出现server is busy状态

| username: yulei7633

Through slow queries, it was found that the cluster experiences a concentrated commit; approximately every 30 minutes, with 17,000 such commands each time, causing a brief slow response time for the entire cluster. Is this caused by the program or by some internal pressure within TiDB?

| username: 像风一样的男子 | Original post link

So regular, check if there are any scheduled tasks, and investigate all the business-related aspects of the database.

| username: yulei7633 | Original post link

I installed the database, and apart from the br backup, there are no scheduled tasks. How do I find the txn for this commit and then find the SQL based on the txn?

| username: zhanggame1 | Original post link

It’s so regular. At the time when there is an issue, try to check the database connection information to see if you can find out what is running.

select * from INFORMATION_SCHEMA.`PROCESSLIST`
where command<>'Sleep'
| username: 小龙虾爱大龙虾 | Original post link

From the monitoring panel, it shows that the scheduler is busy, and only one TiKV node reports this busy status. It is most likely caused by hotspot writes. You can analyze the hotspot write issue.

| username: yulei7633 | Original post link

Which Grafana metrics can be used to determine if hot data is being written?

| username: 随缘天空 | Original post link

Does your application have batch operations, such as batch inserts or imports, where the program submits hundreds of records at a time? Please check.

| username: yulei7633 | Original post link

The developer said there isn’t any, and all requests are coming from users one by one. However, I don’t quite believe the developer’s words, so we still need to find evidence to prove it.

| username: 小龙虾爱大龙虾 | Original post link

You can troubleshoot according to the articles in the column.

| username: 大飞哥online | Original post link

There is regularity, basically all are scheduled tasks execution, check the data itself or the application.

| username: 随缘天空 | Original post link

Yes, if you perform batch inserts or imports, this situation is likely to occur. You can check the corresponding node’s log information within a certain time range on the dashboard’s log search to see if there are logs indicating a large number of operations on the same table.


| username: forever | Original post link

Check if there are any hotspot tables in the traffic visualization.

| username: oceanzhang | Original post link

You can write a script to print out all the connections at that time.

| username: andone | Original post link

Check if there are any scheduled tasks.

| username: yulei7633 | Original post link

It seems to be related to analyze statistics. Every time it appears, it is accompanied by the appearance of the analyze statement.

| username: tidb菜鸟一只 | Original post link

I don’t think analyze should have that many commits. It might be that you have too many commits, and the corresponding table has many submissions, which triggered the automatic analyze (you can first check if it is within the time period for automatic information collection). I suggest checking if the corresponding analyze table has a lot of insert, delete, or update statements at that time.

| username: yulei7633 | Original post link

After testing, because the table has 1024 partitions, the background is constantly analyzing, which consumes a lot of resources and causes the server to be busy at intervals. My current solution is to turn off automatic collection and manually write scripts to collect statistics at a specified time in the early morning. According to the official statement, version 7.5 will be released in the next two months and will have good support for partition statistics collection. I will upgrade the cluster version and observe the situation then.

| username: xingzhenxiang | Original post link

How to disable automatic analyze? I also want to disable it.

| username: yulei7633 | Original post link

You can just turn this off. Especially for tables with many partitions, statistics will still be collected after turning it off. After turning it off, if the table has a large amount of data, you need to wait until the table operation is completed.

| username: xingzhenxiang | Original post link

Received and configured, thank you.