Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: prometheus节点 docdb中日志文件清理并重启节点后,发现磁盘空间并没有释放,之前清理并重启节点后会释放,但现在不行了,不知道为啥?
[TiDB Usage Environment] Production Environment
[TiDB Version] tidb v6.1.0
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: After cleaning the log files in the prometheus node docdb and restarting the node, it was found that the disk space was not released. Previously, cleaning and restarting the node would release the space, but now it doesn’t. Not sure why?]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Long-running transactions may cause data to not be reclaimed in a timely manner.
Manually deleting files under docdb and restarting the Prometheus node did not free up disk space. What could be the issue? The Top SQL feature is also not enabled.
The command to restart the Prometheus node: tiup cluster restart <cluster-name> -N 172.17.4.249:9090
. After restarting the node in this way, the disk space is not released.
Once the files are deleted, the space is immediately freed.
The release of space is not related to restarting Prometheus…
After deleting files, I need to restart the Prometheus node each time to release disk space. It used to work this way, but recently even restarting doesn’t help. I don’t know what the reason is.
How much space did you allocate… If it doesn’t work, try expanding it…
Setting logs to be retained for 3 days has already used up nearly 400GB.
I also encountered the issue where docdb files could not be immediately reclaimed after deletion. A systemctl restart of prometheus-9000 was required to release them.
It seems that the process is occupying the file handle, making it impossible to release it effectively.
If that doesn’t work, you can try: stopping the service first, then deleting the file, and then starting it again.
For example:
systemctl stop prometheus
delete file data…
systemctl start prometheus
tiup cluster reload xxr -R prometheus
tiup cluster reload xx -R grafana
The issue has been resolved. Restarting or reloading Prometheus doesn’t work well; directly killing the monitoring node process can free up disk space.