Drainer Frequently Crashes and Cannot Automatically Recover

translator_bot · June 22, 2024, 10:22pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: drainer经常性挂掉，不能自动恢复

| username: wgimperial

Phenomenon: The drainer component in the cluster frequently goes down and is difficult to restart and recover.
Investigation: The error log shows the phrase “receive big size binlog” nearby, with the binlog size being over 100MB. I saw a similar issue in the forum, refer to the conclusion 3 in the article “What does the SQL synchronized from upstream to Kafka look like in Kafka - qhd2004’s column - 专栏 - 上游sql通过drainer同步到kafka时在kafka中是什么样子的 | TiDB 社区”.

Others: drainer and pump are configured with default settings.
Consultation: What are the strategies or suggestions for handling such issues? Is there a unified solution to this problem, or a way to prevent the drainer from frequently exiting abnormally?

translator_bot · June 22, 2024, 10:22pm

| username: songxuecheng | Original post link

Could you check if it is a memory overflow?

translator_bot · June 22, 2024, 10:22pm

| username: wgimperial | Original post link

The memory of the drainer?

translator_bot · June 22, 2024, 10:22pm

| username: songxuecheng | Original post link

Check the machine’s log to see if there is an OOM (Out of Memory) error.

translator_bot · June 22, 2024, 10:22pm

| username: wgimperial | Original post link

There is no OOM information in the system logs of the machine.

translator_bot · June 22, 2024, 10:22pm

| username: jansu-dev | Original post link

warn code location, this is a warning indicating that a single binlog entry is too large, possibly due to a large transaction. This error cannot directly pinpoint the cause of the crash.
Considering the version is v6.1.0, stop using tidb-binlog and switch to TiCDC. The binlog is no longer officially maintained.
If you still want to investigate the root cause, it’s recommended to provide a clinic. This way, both logs and monitoring can be reviewed.
Suggestions regarding TiDB-Binlog → Read the official documentation once, and then check the common FAQ. That should be sufficient. Binlog is quite old now.

translator_bot · June 22, 2024, 10:22pm

| username: Raymond | Original post link

There is a bug in TiDB’s binlog in version 5.4.0 and below. After experiencing a large transaction, the memory of the drainer process often increases. You can check if this phenomenon occurs.

github.com/pingcap/tidb-binlog

memory leak in drainer after a big transaction

opened 06:34AM - 16 Nov 21 UTC

closed 09:29AM - 18 Nov 21 UTC

GMHDBJD

type/bug

## Bug Report Please answer these questions before submitting your issue. Tha…nks! 1. What did you do? If possible, provide a recipe for reproducing the error. upstream execute a big transaction(e.g. delete many rows) daily(e.g. at 12:00 a.m.) 2. What did you expect to see? drainer memory reduce after the transaction was replicated. 3. What did you see instead? drainer memory didn't reduce. 4. Please provide the relate downstream type and version of drainer. (run `drainer -V` in terminal to get drainer's version) v4.0.14

translator_bot · June 22, 2024, 10:22pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.