Client Reports "Server is Busy" Error

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 客户端报 server is busy 错误

| username: yulei7633

The above screenshot:
A TiKV contains two RocksDB instances, one for storing Raft logs, located at data/raft. The other is for storing actual data, located at data/db. You can check the specific reasons for stalling by using grep "Stalling" RocksDB in the logs. The RocksDB logs are files starting with LOG, and LOG is the current log.
I understand this sentence. But in the actual TiKV environment, it doesn’t match.
The files are different from what is described in the documentation. How exactly do you check the specific reasons for stalling using grep "Stalling" RocksDB? Which file should be queried?

| username: yulei7633 | Original post link

From the monitoring, it does show “server is busy.” I want to find out exactly where the problem is occurring. In the file: check the specific reason for the stall by using grep "Stalling" RocksDB in the logs. How do I query this?

| username: 大飞哥online | Original post link

Check the rocksdb.log for occurrences of “Stalling”.

| username: h5n1 | Original post link

There is a stall reason in the TiKV monitoring of RocksDB.

| username: yulei7633 | Original post link

I did not find this file.

| username: yulei7633 | Original post link

In the same time period. It looks normal from the picture.

| username: dba远航 | Original post link

Is the path you are querying correct?

| username: Jellybean | Original post link

In addition to the client reporting the server is busy error, it is also necessary to check for any other related anomalies around the time of the issue. The following items, for example, can trigger a server is busy error:

  • Write stall occurs
  • Scheduler too busy occurs
  • Raftstore is busy occurs
  • Coprocessor backlog occurs

For specific methods of confirmation and solutions, you can refer to the official documentation:

| username: 随缘天空 | Original post link

It is likely caused by the high load on the current TiKV server, which may be due to high concurrent access, large data processing, or other system loads. You can check the resource issues (CPU, memory usage, etc.) of the TiKV server during that period, and also look at the monitoring graphs to see if there are any hotspot issues.

| username: h5n1 | Original post link

Then it must be caused by other reasons.

| username: 小龙虾爱大龙虾 | Original post link

In higher versions, the flow control mechanism of RocksDB has been replaced at the scheduler layer. Refer to: TiKV 配置文件描述 | PingCAP 文档中心
At the same time, the TiKV Detail panel has also added a dedicated tab for flow control, where you can view related monitoring.
Check the TiDB logs and search for the keyword serverisbusy to see the reason for the busy status. The root cause is usually high pressure on TiKV or hotspot issues.