Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: kettle使用tidb,数据量大时,kettle内存oom
When using TiDB with Kettle, the memory runs out (OOM) after the amount of data meeting the conditions reaches 4 million. I am seeking advice from everyone on how to optimize this. When using Kettle with Oracle, it can normally synchronize even with data exceeding 100 million, and the synchronization efficiency is very high. I don’t understand why TiDB keeps encountering OOM issues.
TiDB’s OOM is quite annoying…
Check if Kettle has stream replication configured.
Does MySQL have the same issue? You can try MySQL first, because Oracle and TiDB use different JDBC in Java, you can simply check if it is caused by the behavior of JDBC.
Did the OOM occur in Kettle or the TiDB server?
The kettle process encountered an OOM (Out of Memory) error, but the TiDB service is functioning normally. It seems that all the qualifying data was loaded into the JVM memory, making it impossible to use cursors and stream replication properly.
MySQL and PostgreSQL are both working fine.
Let’s look at a few aspects:
- Number of commit records: 10,000;
- Database connection parameters:
useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true
- Parallelism:
Right-click “START Component” and check Run Next Entries in Parallel to set it up.
To avoid OOM, you can reduce the number of commit records and decrease parallelism.
It is recommended to submit in batches. This is closely related to the server’s memory. If the memory is not large enough, submit in batches.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.