Drainer Out of Memory

translator_bot · June 22, 2024, 7:29am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: drainer out of memory

| username: TiDBer_yYMGzrbZ

[TiDB Usage Environment] Production Environment
[TiDB Version] V3.0.9
[Reproduction Path] Adding a drainer node with ansible, there is a process at startup but no port.
[Encountered Problem: Symptoms and Impact]
During the drainer startup process, memory usage spikes, leading to OOM and the system kills the drainer.
The downstream is writing files to the local directory, not a database.
[Resource Configuration] 16 cores, 32G memory
[Attachments: Screenshots/Logs/Monitoring]

translator_bot · June 22, 2024, 7:29am

| username: weixiaobing | Original post link

Has this database been around for a long time? Are there many DDLs? If so, it might be a known issue.

translator_bot · June 22, 2024, 7:29am

| username: TiDBer_yYMGzrbZ | Original post link

Yes, it’s been a long time, with many DDL operations. Recently, I wanted to upgrade to a higher version, so I backed up and synchronized the data first. Therefore, I enabled binlog and added a drainer node. As a result, the memory was exhausted before it even finished starting up.

translator_bot · June 22, 2024, 7:29am

| username: Raymond | Original post link

Is the downstream binlog consumer a file, or another TiDB or MySQL?

translator_bot · June 22, 2024, 7:29am

| username: xfworld | Original post link

That’s a very old version…

The minor version of 3.0.x has reached 20, and it has probably fixed a lot of bugs. You might want to consider upgrading.
It’s best to upgrade to the mainline version, as there will be fewer issues.

translator_bot · June 22, 2024, 7:29am

| username: TiDBer_yYMGzrbZ | Original post link

It is a file.

translator_bot · June 22, 2024, 7:29am

| username: buddyyuan | Original post link

Drainer needs to pull the full history of DDL when starting. It seems that you have too much historical DDL history.

translator_bot · June 22, 2024, 7:29am

| username: Raymond | Original post link

This could be caused by a large transaction. In the drainer code, there is no mechanism for archiving large transactions to disk. You can ensure that on the basis of a full backup, manually edit the commitTS in the drainer’s savepoint file, then restart the drainer to skip that large transaction.

translator_bot · June 22, 2024, 7:29am

| username: TiDBer_yYMGzrbZ | Original post link

The first time I started drainer, it didn’t start, and no savepoint was generated. Can I manually write one and put it up, then start it?

translator_bot · June 22, 2024, 7:29am

| username: Raymond | Original post link

Is there no savepoint file for drainer? This was added later, right? Normally, there shouldn’t always be large transactions, should there?

translator_bot · June 22, 2024, 7:29am

| username: TiDBer_yYMGzrbZ | Original post link

Yes, it was added later, and it is currently up and running (using 32GB RAM + 250GB SWAP, which took more than 10 hours). When drainer starts, it will load historical DDL, so it requires a lot of memory. If there is not enough memory, it will result in an OOM and the program will be killed by the system. Currently, data synchronization cannot catch up, and the delay is significant. SWAP still cannot be used.

translator_bot · June 22, 2024, 7:29am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.