Dm-worker Log Cleanup

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dm-worker日志清理

| username: 孤独的狼

[TiDB Usage Environment] Production Environment / Testing / Poc
Production Environment
[TiDB Version]
[Reproduction Path] Operations performed that led to the issue

[Encountered Issue: Problem Description and Impact]
dm-worker node logs are too large
dm-worker-2023-12-06T05-52-59.693.log

[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

| username: 孤独的狼 | Original post link

v7.1.0 version

| username: 孤独的狼 | Original post link

The image cannot be translated directly. Please provide the text content for translation.

| username: 孤独的狼 | Original post link

The newly set up environment has not undergone any operations, but there is a DW tool synchronizing data for us.

| username: 孤独的狼 | Original post link

Does dm-worker have parameters to clean up regularly?

| username: jaybing926 | Original post link

The official team will not proactively clean up logs. You can write a script and set up a crontab to clean them up periodically.

For example, to periodically clean up log files older than 30 days:
find /path/to/log/dir/ -name '*.log*' -mtime +30 -delete

| username: 芮芮是产品 | Original post link

It can be deleted.

| username: TiDBer_小阿飞 | Original post link

Automatic Data Cleanup

To enable automatic data cleanup, configure the following in the source configuration file:

# relay log purge strategy
purge:
    interval: 3600
    expires: 24
    remain-space: 15
  • purge.interval
    • The interval for automatic background cleanup, in seconds.
    • Default is “3600”, meaning the background cleanup task is executed every 3600 seconds.
  • purge.expires
    • The number of hours that the relay log, which is not being written by the current relay processing unit or is not needed by current or future data migration tasks, can be retained before being cleaned up by the background process.
    • Default is “0”, meaning data cleanup is not performed based on the relay log’s update time.
  • purge.remain-space
    • Remaining disk space, in GB. If the remaining disk space is less than this configuration, the specified DM-worker machine will attempt to automatically clean up safely removable relay logs in the background. If this number is set to “0”, data cleanup is not performed based on remaining disk space.
    • Default is “15”, meaning DM-master will attempt to safely clean up relay logs when available disk space is less than 15GB.

Manual Data Cleanup

Manual data cleanup refers to using the purge-relay command provided by dmctl to clean up all relay logs before the specified binlog by specifying subdir and binlog file name. If the -subdir option is not specified in the command, all relay logs before the latest relay log subdirectory are cleaned up by default.

| username: Jjjjayson_zeng | Original post link

For binlog, our operation method is as follows:

First, we compress and back up a portion, then delete it. If there are any issues, we restore it. Generally, we only keep 3 days’ worth. Since it’s a cluster deployment, the script can be placed on the worker nodes. Hope you can give a thumbs up.

| username: Jjjjayson_zeng | Original post link

To add one more point, the downside is that whenever you add a new worker node, you need to create a new script. However, if you standardize this process, it should be manageable. After all, there usually aren’t that many nodes.

| username: 有猫万事足 | Original post link

Use dm exec

tiup dm exec [dm-cluster] --command=‘pwd’ -R dm-worker

Try the above command. It can execute the pwd command uniformly on dm-worker.
Replace it with the log cleaning command you need.

It’s best not to directly use rm -rf in place. You can first uniformly move (mv) to another location (such as /bak), then after checking that there are no issues, use rm -rf /bak/*

| username: 孤独的狼 | Original post link

Is the configuration file mentioned here a TiDB configuration file or a DM configuration file?

| username: Jjjjayson_zeng | Original post link

It might be more reasonable to write it yourself, take a look for reference.

| username: 孤独的狼 | Original post link

Okay, thank you.

| username: xingzhenxiang | Original post link

Directly delete those with dates.

| username: Jjjjayson_zeng | Original post link

Please take some time to give a like and mark it as the best answer, Thanks♪(・ω・)ノ

| username: dba远航 | Original post link

Regular cleaning will do.

| username: xmlianfeng | Original post link

Create a scheduled task and put it in. For example:

find /data/tidb/dm/deploy/ -name "dm-worker-2023*.log" | xargs -i rm -rf {}
| username: 孤独的狼 | Original post link

Thank you all, the issue has been resolved.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.