Using TiDB with Kettle: Kettle Runs Out of Memory (OOM) with Large Data Volumes

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: kettle使用tidb,数据量大时,kettle内存oom

| username: txf超越123666

When using TiDB with Kettle, the memory runs out (OOM) after the amount of data meeting the conditions reaches 4 million. I am seeking advice from everyone on how to optimize this. When using Kettle with Oracle, it can normally synchronize even with data exceeding 100 million, and the synchronization efficiency is very high. I don’t understand why TiDB keeps encountering OOM issues.

| username: 我是咖啡哥 | Original post link

TiDB’s OOM is quite annoying…
Check if Kettle has stream replication configured.

| username: Icemap | Original post link

Does MySQL have the same issue? You can try MySQL first, because Oracle and TiDB use different JDBC in Java, you can simply check if it is caused by the behavior of JDBC.

| username: tracy0984 | Original post link

Did the OOM occur in Kettle or the TiDB server?

| username: txf超越123666 | Original post link

The kettle process encountered an OOM (Out of Memory) error, but the TiDB service is functioning normally. It seems that all the qualifying data was loaded into the JVM memory, making it impossible to use cursors and stream replication properly.

| username: txf超越123666 | Original post link

Study and research

| username: txf超越123666 | Original post link

MySQL and PostgreSQL are both working fine.

| username: gcworkerishungry | Original post link

Let’s look at a few aspects:

  1. Number of commit records: 10,000;
  2. Database connection parameters:
    useServerPrepStmts=false
    rewriteBatchedStatements=true
    useCompression=true
  3. Parallelism:
    Right-click “START Component” and check Run Next Entries in Parallel to set it up.

To avoid OOM, you can reduce the number of commit records and decrease parallelism.

| username: Running | Original post link

It is recommended to submit in batches. This is closely related to the server’s memory. If the memory is not large enough, submit in batches.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.