After Drainer starts successfully, can it skip loadHistoryDDLJobs upon restart?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: drainer启动成功后,重启可以跳过loadHistoryDDLJobs吗

| username: TiDBer_lm8fSeXQ

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] V3.0.9
[Reproduction Path] Due to a large number of create table if not exists in the business code, the historical DDL jobs have accumulated too much, exceeding 128G. Each time loadHistoryDDLJobs takes more than 12 hours (insufficient memory, had to enable swap to run). If restarted later, each time loadHistoryDDLJobs is reloaded, it greatly affects work efficiency. Is it possible to start running from a save point?
[Encountered Problem: Problem Phenomenon and Impact]
Due to a large number of create table if not exists in the business code, the historical DDL jobs have accumulated too much, exceeding 128G. Each time loadHistoryDDLJobs takes more than 12 hours (insufficient memory, had to enable swap to run). If restarted later, each time loadHistoryDDLJobs is reloaded, it greatly affects work efficiency. Is it possible to start running from a save point?

| username: Billmay表妹 | Original post link

Version 3.0.9 is too old, and many issues in older versions have been fixed in newer versions. You might consider upgrading your cluster.

Additionally, if you are experiencing insufficient memory, you might want to upgrade your configuration.

When Drainer starts, it loads all historical DDL jobs to correctly filter and synchronize data. If your business code frequently uses create table if not exists, resulting in too many historical DDL jobs, Drainer will need to load these historical DDL jobs each time it starts. This can lead to long startup times or even out-of-memory (OOM) issues.

If you want Drainer to skip loading historical DDL jobs at startup, you can add the --no-load-ddl-job parameter to the startup command. This way, Drainer will not load historical DDL jobs but will start synchronizing data from the last saved savepoint. However, this might cause some issues during data synchronization because some operations in the historical DDL jobs might affect the currently synchronizing data.

If you want Drainer to start from the savepoint after restarting, you can set the schema-version in the savepoint file to 0 and then restart Drainer. This way, Drainer will start synchronizing data from the savepoint instead of loading historical DDL jobs. However, this might also cause some issues during data synchronization because some operations in the historical DDL jobs might affect the currently synchronizing data.

| username: Raymond | Original post link

This phenomenon only occurs when the downstream object of the drainer is a TiDB cluster, right? If the downstream object of the drainer is a file, this won’t happen, right?

| username: redgame | Original post link

I remember it shouldn’t be done, adding more memory is better.

| username: TiDBer_lm8fSeXQ | Original post link

  1. The prerequisite for upgrading the cluster is to back up the data. Directly upgrading without a backup theoretically poses technical risks. Precisely because we are upgrading the cluster, we need to perform both full and incremental backups.
  2. For cloud servers, the physical machine limit is 128GB. This is not feasible for business purposes.
  3. The parameter --no-load-ddl-job was not found in the source code; it is possible that I did not search correctly.
| username: TiDBer_lm8fSeXQ | Original post link

Yes, after switching to 128GB of memory and adding swap, the binlog caught up. The swap was close to 100% for a long time, and it took 5 days to catch up.

| username: TiDBer_lm8fSeXQ | Original post link

From the source code, it is unrelated to downstream, and this is explained in the official documentation. When starting Drainer, Drainer will request TiKV to obtain information on all historical DDL jobs, filter these DDL jobs, and construct the database and table structure information in memory for DDLs before the initial-commit-ts specified when Drainer starts (or the commit_ts saved in the checkpoint). This way, Drainer has a snapshot of the database and table at the corresponding ts time point. When reading DDL type binlogs, it updates the database and table information; when reading DML type binlogs, it generates SQL based on the database and table information.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.