Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb日志猛增
[Issue Encountered]: In the testing environment, a colleague deployed TiDB, 3 TiKV, and Prometheus on a virtual machine with a 200G disk. Starting this Tuesday, disk space warnings began to appear, and it was found that the TiDB log files were too large, reaching 70G. After manually deleting and retaining only the logs from the last three days, there were still about 40G left. Now, even with only the last two days’ logs retained, there are still 30G. This situation did not occur before, and the logs have increased sharply. Upon entering the log directory of the TiDB component, it was found that a log file is generated every ten minutes or so, with each log file being 300M. As of June 27, there are already nearly 100 log files.
Look at the error in the logs. If you eliminate the related errors, there won’t be any more alerts.
Take a look at the tail to see what it is. Generally, the most common logs during operation might be related to a slow COPROCESSOR.
Why don’t you post which logs are flooding the screen?
When deploying previously, the log level was not set. There are a lot of info logs, so I changed the level to error, but I still see info level logs recently, indicating the change did not take effect. I will check if the modification was made in the wrong place and did not take effect.
After modifying the configuration, you need to use the reload command to take effect.
Set the log level to error.
Check the log level, if it doesn’t work, write a script to delete it first.
Find some free time to reload the configuration.
First, take a look at what the errors are.
Please provide a screenshot of the logs.
Increase the log level to error.
You need to reload it for the changes to take effect.
You need to reload, otherwise it won’t take effect.
If the hardware configuration allows, it is recommended not to set it to error, as it can be quite troublesome when troubleshooting issues. Some SQL statements that report errors are logged as warnings. I have encountered situations where SQL execution errors were reported in the monitoring, but they were not visible in the logs. After researching, I found that such SQL-level errors are logged as warnings.
Check the TiDB configuration file and adjust the log level to an appropriate level (e.g., info or warn) to avoid using the debug level.
Adjust the log level, and try to avoid using ERROR.
If the logs of other nodes have not surged, then check if the business is only using this one TiKV node.
Could it be a load balancing issue? Did everything end up on one host?