Using TiDB with Kettle: Kettle Runs Out of Memory (OOM) with Large Data Volumes

translator_bot · June 22, 2024, 9:29pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: kettle使用tidb，数据量大时，kettle内存oom

| username: txf超越123666

When using TiDB with Kettle, the memory runs out (OOM) after the amount of data meeting the conditions reaches 4 million. I am seeking advice from everyone on how to optimize this. When using Kettle with Oracle, it can normally synchronize even with data exceeding 100 million, and the synchronization efficiency is very high. I don’t understand why TiDB keeps encountering OOM issues.

translator_bot · June 22, 2024, 9:29pm

| username: 我是咖啡哥 | Original post link

TiDB’s OOM is quite annoying…
Check if Kettle has stream replication configured.

translator_bot · June 22, 2024, 9:29pm

| username: Icemap | Original post link

Does MySQL have the same issue? You can try MySQL first, because Oracle and TiDB use different JDBC in Java, you can simply check if it is caused by the behavior of JDBC.

translator_bot · June 22, 2024, 9:29pm

| username: tracy0984 | Original post link

Did the OOM occur in Kettle or the TiDB server?

translator_bot · June 22, 2024, 9:29pm

| username: txf超越123666 | Original post link

The kettle process encountered an OOM (Out of Memory) error, but the TiDB service is functioning normally. It seems that all the qualifying data was loaded into the JVM memory, making it impossible to use cursors and stream replication properly.

translator_bot · June 22, 2024, 9:29pm

| username: txf超越123666 | Original post link

Study and research

translator_bot · June 22, 2024, 9:29pm

| username: txf超越123666 | Original post link

MySQL and PostgreSQL are both working fine.

translator_bot · June 22, 2024, 9:29pm

| username: gcworkerishungry | Original post link

Let’s look at a few aspects:

Number of commit records: 10,000;
Database connection parameters:
useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true
Parallelism:
Right-click “START Component” and check Run Next Entries in Parallel to set it up.

To avoid OOM, you can reduce the number of commit records and decrease parallelism.

translator_bot · June 22, 2024, 9:29pm

| username: Running | Original post link

It is recommended to submit in batches. This is closely related to the server’s memory. If the memory is not large enough, submit in batches.

translator_bot · June 22, 2024, 9:29pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.