How to Migrate the Drainer Component

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何迁移drainer组件

| username: qufudcj

[Test Environment for TiDB]

[TiDB Version] 4.0.4

[Reproduction Path]

[Encountered Issue: Phenomenon and Impact]

Using tiup for deployment

Recently, the memory usage of drainer has surged. After checking other Q&A on the forum, it seems that there is no other solution except upgrading the version, but currently, upgrading is not considered.

The machine where drainer is currently located also has pump and kafka, and it often runs out of memory. Now I want to migrate drainer to a machine with larger memory.

However, I only found the migration methods for pd/tidb/tikv/tiflash/ticdc in the documentation.

How can I migrate drainer?

Should I first use tiup cluster scale-in --node to take drainer offline,

then record the current ts,

and finally use tiup cluster scale-out scale-out.yaml, configuring drainer_servers in the file?

[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

The memory surge in drainer is due to too many DDLs. Unless redeployed, the DDLs will persist…

You can try scaling out drainer first, using configuration to utilize new resources before decommissioning the old ones.

Handling it the way you understand is also possible, but there will be a period when binlog cannot be retrieved (the ts is very important and needs to be recorded properly).

You can refer to this document for a deeper understanding:

| username: qufudcj | Original post link

My downstream is Kafka, and it seems that relay log cannot be enabled.

Do you mean that I should first scale out the drainer to 2, and then scale it down to 1? Is this the most reliable way?

| username: xfworld | Original post link

The relay log also obtains the starting position through ts.

Yes, you can handle it according to your approach, both are fine.

| username: qufudcj | Original post link

However, the drainer configuration in the documentation is a bit confusing to me. Since it is marked as deprecated, how should I specify the ts value?

| username: xfworld | Original post link

The documentation you are looking at is not for version 4.X, right?

After version 5.X, this component has been replaced by TiCDC. It is also recommended that you upgrade your version.

| username: qufudcj | Original post link

Thank you, it was my oversight. However, our production environment is too complex. We use Ansible to deploy the cluster, and then it’s drainer-kafka-Ali DTS.

It seems that we need to completely uninstall binlog to switch from Ansible to TiUP, and then upgrade the version. The risk is still quite high.

| username: qufudcj | Original post link

I would like to ask another question. If I use my method, will the newly started drainer begin to synchronize data from the beginning in full? Since this is a test environment, reading from the latest ts doesn’t matter, but I hope to avoid full synchronization from the beginning.

| username: xfworld | Original post link

Take a close look at the images I posted in my reply.

| username: TiDBer_lm8fSeXQ | Original post link

It is recommended to enable swap and set it to 512G. If 512G is fully utilized, it will take approximately 24 hours.