Client Reports "Server is Busy" Error

translator_bot · June 21, 2024, 1:31pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 客户端报 server is busy 错误

| username: yulei7633

The above screenshot:
A TiKV contains two RocksDB instances, one for storing Raft logs, located at data/raft. The other is for storing actual data, located at data/db. You can check the specific reasons for stalling by using grep "Stalling" RocksDB in the logs. The RocksDB logs are files starting with LOG, and LOG is the current log.
I understand this sentence. But in the actual TiKV environment, it doesn’t match.
The files are different from what is described in the documentation. How exactly do you check the specific reasons for stalling using grep "Stalling" RocksDB? Which file should be queried?

translator_bot · June 21, 2024, 1:31pm

| username: yulei7633 | Original post link

From the monitoring, it does show “server is busy.” I want to find out exactly where the problem is occurring. In the file: check the specific reason for the stall by using grep "Stalling" RocksDB in the logs. How do I query this?

translator_bot · June 21, 2024, 1:31pm

| username: 大飞哥online | Original post link

Check the rocksdb.log for occurrences of “Stalling”.

translator_bot · June 21, 2024, 1:31pm

| username: h5n1 | Original post link

There is a stall reason in the TiKV monitoring of RocksDB.

translator_bot · June 21, 2024, 1:31pm

| username: yulei7633 | Original post link

I did not find this file.

translator_bot · June 21, 2024, 1:31pm

| username: yulei7633 | Original post link

In the same time period. It looks normal from the picture.

translator_bot · June 21, 2024, 1:31pm

| username: dba远航 | Original post link

Is the path you are querying correct?

translator_bot · June 21, 2024, 1:31pm

| username: Jellybean | Original post link

In addition to the client reporting the server is busy error, it is also necessary to check for any other related anomalies around the time of the issue. The following items, for example, can trigger a server is busy error:

Write stall occurs
Scheduler too busy occurs
Raftstore is busy occurs
Coprocessor backlog occurs

For specific methods of confirmation and solutions, you can refer to the official documentation:

translator_bot · June 21, 2024, 1:31pm

| username: 随缘天空 | Original post link

It is likely caused by the high load on the current TiKV server, which may be due to high concurrent access, large data processing, or other system loads. You can check the resource issues (CPU, memory usage, etc.) of the TiKV server during that period, and also look at the monitoring graphs to see if there are any hotspot issues.

translator_bot · June 21, 2024, 1:31pm

| username: h5n1 | Original post link

Then it must be caused by other reasons.

translator_bot · June 21, 2024, 1:31pm

| username: 小龙虾爱大龙虾 | Original post link

In higher versions, the flow control mechanism of RocksDB has been replaced at the scheduler layer. Refer to: TiKV 配置文件描述 | PingCAP 文档中心
At the same time, the TiKV Detail panel has also added a dedicated tab for flow control, where you can view related monitoring.
Check the TiDB logs and search for the keyword serverisbusy to see the reason for the busy status. The root cause is usually high pressure on TiKV or hotspot issues.