Disk space less than 10G, stop-write-at-available-space default 10G, causing binlog not to be written, still not writing binlog after disk expansion

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 磁盘空间小于10G,stop-write-at-available-space 默认10G,导致不写binlog,磁盘扩容后还是不写binlog

| username: suqingbin0315

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.1
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
Disk space is less than 10G, stop-write-at-available-space default is 10G, causing binlog not to be written. Even after expanding the disk, binlog is still not written.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Pump log when disk is full:
[2023/05/25 09:51:05.700 +08:00] [INFO] [server.go:563] [“server info tick”] [writeBinlogCount=87937391] [alivePullerCount=1] [MaxCommitTS=441707256732188673]
[2023/05/25 09:51:06.697 +08:00] [WARN] [storage.go:340] [“no available space, you may want to free up some space or decrease stop-write-at-available-space configuration”] [available=10724401152] [StopWriteAtAvailableSpace=10737418240]
Current pump log:
[2023/05/25 11:27:15.700 +08:00] [INFO] [server.go:563] [“server info tick”] [writeBinlogCount=87937393] [alivePullerCount=1] [MaxCommitTS=441708768922959882]
[2023/05/25 11:27:25.698 +08:00] [INFO] [storage.go:387] [DBStats] [DBStats=“{"WriteDelayCount":0,"WriteDelayDuration":0,"WritePaused":false,"AliveSnapshots":0,"AliveIterators":0,"IOWrite":8879673729,"IORead":13073115775,"BlockCacheSize":8046004,"OpenedTablesCount":7,"LevelSizes":[114836538,731288813],"LevelTablesCounts":[5,11],"LevelRead":[0,3638915187],"LevelWrite":[1515677908,2969180855],"LevelDurations":[30038423418,113505522758]}”]
[2023/05/25 11:27:25.700 +08:00] [INFO] [server.go:563] [“server info tick”] [writeBinlogCount=87937393] [alivePullerCount=1] [MaxCommitTS=441708772081795074]

| username: Billmay表妹 | Original post link

Based on the provided information, it can be seen that TiDB Pump stops writing binlog when the disk space is less than 10G, and after expanding the disk space, Pump still does not write binlog. Meanwhile, the Pump log shows that the writeBinlogCount value has not changed.

First, it is necessary to confirm whether the binlog function is enabled in the Pump configuration file. If it is not enabled, Pump will not write binlog. You can refer to the binlog configuration in the official TiDB documentation for configuration.

If the binlog function is already enabled, you can check whether there are any error messages in the Pump log file. If there are no error messages, you can try restarting the Pump service to ensure the configuration takes effect.

If the service still cannot write binlog after restarting, you can check whether the disk space has been successfully expanded and whether it meets the stop-write-at-available-space configuration. If the disk space meets the requirements, you can try modifying the stop-write-at-available-space configuration to a smaller value to ensure that Pump can write binlog normally.

If the above methods cannot solve the problem, you can try upgrading the TiDB and Pump versions.

| username: suqingbin0315 | Original post link

After restarting the Pump service, binlog is still not being written.
[2023/05/25 15:26:11.695 +08:00] [INFO] [server.go:563] [“server info tick”] [writeBinlogCount=0] [alivePullerCount=0] [MaxCommitTS=441712526993129477]
[2023/05/25 15:26:21.694 +08:00] [INFO] [storage.go:387] [DBStats] [DBStats=“{"WriteDelayCount":0,"WriteDelayDuration":0,"WritePaused":false,"AliveSnapshots":0,"AliveIterators":0,"IOWrite":14303991,"IORead":42327362,"BlockCacheSize":2406698,"OpenedTablesCount":6,"LevelSizes":[129107822,731288813],"LevelTablesCounts":[6,11],"LevelRead":[0,0],"LevelWrite":[0,0],"LevelDurations":[0,0]}”]
[2023/05/25 15:26:21.696 +08:00] [INFO] [server.go:563] [“server info tick”] [writeBinlogCount=0] [alivePullerCount=1] [MaxCommitTS=441712530138857497]


After 1 AM, when the disk space was less than 10G, binlog stopped being written.

Using tiup cluster show-config to check, binlog is enabled, but not in the pump.
server_configs:
tidb:
binlog.enable: true
binlog.ignore-error: true

pump_servers:

  • host: 172.
    ssh_port: 22
    port: 8250
    deploy_dir: /home/pirate/programs/tidb-deploy/pump-8250
    data_dir: /home/pirate/programs/tidb-data/pump-8250
    log_dir: /home/pirate/programs/log/pump-8250
    arch: amd64
    os: linux

The pump configuration file is the default.
cat /home/pirate/programs/tidb-deploy/pump-8250/conf/pump.toml

 # WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
  # You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
 # All configuration items you want to change can be added to:
 # server_configs:
 #   pump:
 #     aa.b1.c3: value
 #     aa.b2.c4: value
| username: suqingbin0315 | Original post link

The configuration of stop-write-at-available-space, setting it to a smaller value, also doesn’t work.

Can I upgrade the Pump version separately without upgrading the cluster?

| username: suqingbin0315 | Original post link

From the TiDB logs, it shows that after the expansion was successful at 9:51, it started writing binlog, but neither pump nor drainer received the binlog.
[2023/05/25 09:51:17.450 +08:00] [INFO] [client.go:570] [“[pumps client] write detect binlog to unavailable pump success”] [NodeID=172.23.224.123:8250]
[2023/05/25 09:51:17.450 +08:00] [INFO] [client.go:397] [“[pumps client] set pump available”] [NodeID=172.23.224.123:8250] [available=true]

| username: Raymond | Original post link

Check your TiDB configuration parameter binlog.ignore-error.

| username: suqingbin0315 | Original post link

tidb:
    binlog.enable: true
    binlog.ignore-error: true
| username: Raymond | Original post link

You have set binlog.ignore-error: true, so if there is an issue with the pump causing the TiDB node to be unable to write, the TiDB node will skip the process of writing to the pump. You need to reset the TiDB write pump status with the following command:

curl http://{TiDBIP}:10080/binlog/recover
| username: suqingbin0315 | Original post link

Based on the expert’s suggestion, I found this blog, 专栏 - 监控告警处理之tidb_server_critical_error_total | TiDB 社区. Running binlog/recover solved the issue, and I saw the binlog logs starting to update. However, the output from curl http://ip:10080/info/all still shows “binlog_status”: “Skipping”. I’ll ignore it for now. Thanks a lot to the expert.

| username: suqingbin0315 | Original post link

May I ask another question? Can binlog be manually cleaned? I found that binlogs older than 7 days before the issue are still there.

| username: Raymond | Original post link

curl http://ip:10080/info/all still shows “binlog_status”: “Skipping”.
This is a bug, and it requires version 6.5.1 to fix it.
You can use curl http://{TiDBIP}:10080/binlog/recover?op=status
If this returns skipped:false, it means the TiDB node has not skipped writing to the pump and has already recovered.

| username: Raymond | Original post link

You can use rm to manually delete, or you can configure parameters.
For configuring parameters, you can refer to this article: 专栏 - # drainer binlog 清理机制 源码详解 | TiDB 社区

| username: suqingbin0315 | Original post link

Thank you, thank you.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.