One of the machines in the pump has a full disk, and the drainer does not proceed after cleaning

translator_bot · June 22, 2024, 7:07pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pump有一台机器磁盘满了，清理后drainer不走

| username: kuweilong666

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
The pumps on machines 157, 186, and 187, where the disk on the pump machine 157 was full, causing the pump to stop writing logs. After clearing out space, the drainer does not proceed and remains inactive.

[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

translator_bot · June 22, 2024, 7:07pm

| username: WalterWj | Original post link

Try restarting the pump and drainer.

translator_bot · June 22, 2024, 7:07pm

| username: CuteRay | Original post link

After cleaning up, restart the pump first.

translator_bot · June 22, 2024, 7:07pm

| username: kuweilong666 | Original post link

Restarting the pump requires restarting the TiDB cluster, which is quite troublesome. The key point is that now the 157 pump is writing logs normally.

translator_bot · June 22, 2024, 7:07pm

| username: CuteRay | Original post link

There’s no need for that. Why would you need to restart the TiDB cluster just to restart the pump? Each component of the TiDB cluster can be restarted individually.

translator_bot · June 22, 2024, 7:07pm

| username: kuweilong666 | Original post link

The cluster startup operation will start all components of the entire TiDB cluster in the order of PD → TiKV → Pump → TiDB → TiFlash → Drainer → TiCDC → Prometheus → Grafana → Alertmanager.

translator_bot · June 22, 2024, 7:07pm

| username: xingzhenxiang | Original post link

Try reload -N

translator_bot · June 22, 2024, 7:07pm

| username: kuweilong666 | Original post link

4.0 uses ansible-playbook

translator_bot · June 22, 2024, 7:07pm

| username: xingzhenxiang | Original post link

I have already tiup-ed 3.1.0.

translator_bot · June 22, 2024, 7:07pm

| username: db_user | Original post link

This startup sequence just indicates the normal startup order for the entire TiDB cluster. Each component and each node can be started individually, and you can also change 4 to tiup.

Or try binlogctl

translator_bot · June 22, 2024, 7:07pm

| username: kuweilong666 | Original post link

It’s working now. After individually restarting the pump on 157, the synchronization is normal.

translator_bot · June 22, 2024, 7:07pm

| username: Raymond | Original post link

Is your binlog-ignore-error set to true?