One of the machines in the pump has a full disk, and the drainer does not proceed after cleaning

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pump有一台机器磁盘满了,清理后drainer不走

| username: kuweilong666

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
The pumps on machines 157, 186, and 187, where the disk on the pump machine 157 was full, causing the pump to stop writing logs. After clearing out space, the drainer does not proceed and remains inactive.

[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: WalterWj | Original post link

Try restarting the pump and drainer.

| username: CuteRay | Original post link

After cleaning up, restart the pump first.

| username: kuweilong666 | Original post link

Restarting the pump requires restarting the TiDB cluster, which is quite troublesome. The key point is that now the 157 pump is writing logs normally.

| username: CuteRay | Original post link

There’s no need for that. Why would you need to restart the TiDB cluster just to restart the pump? Each component of the TiDB cluster can be restarted individually.

| username: kuweilong666 | Original post link

The cluster startup operation will start all components of the entire TiDB cluster in the order of PD → TiKV → Pump → TiDB → TiFlash → Drainer → TiCDC → Prometheus → Grafana → Alertmanager.

| username: xingzhenxiang | Original post link

Try reload -N

| username: kuweilong666 | Original post link

4.0 uses ansible-playbook

| username: xingzhenxiang | Original post link

I have already tiup-ed 3.1.0.

| username: db_user | Original post link

This startup sequence just indicates the normal startup order for the entire TiDB cluster. Each component and each node can be started individually, and you can also change 4 to tiup.

Or try binlogctl

| username: kuweilong666 | Original post link

It’s working now. After individually restarting the pump on 157, the synchronization is normal.

| username: Raymond | Original post link

Is your binlog-ignore-error set to true?