Memory Usage Reaches 99% When Exporting with Dumpling

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dumpling导出的时候发现内存使用率99%

| username: 胡杨树旁

Last night around 21:00, I performed a time-segmented backup on a full table with 1.2T of data and found that the memory was almost exhausted. However, the maximum memory usage for the executed SQL was around 248G. The server has 512G of memory, and only one TiDB node is deployed on this server. Upon checking the monitoring, I found that the heapInUse memory usage was also very high. I would like to ask what this metric means.

| username: WalterWj | Original post link

Heap memory. Process memory is the memory of the process.
The former can be simply considered as the actual memory needed by TiDB. Process memory will include other memory that has not been garbage collected.

| username: 哈喽沃德 | Original post link

Can’t it be backed up in batches?

| username: xfworld | Original post link

It’s best to break down this SQL further.

| username: FutureDB | Original post link

The maximum memory usage was 248.5GB, and there was no OOM (Out of Memory) :sob:

| username: WalterWj | Original post link

Try using the dumpling --where command instead of SQL.

| username: dba远航 | Original post link

The server database uses a total of approximately 420G of memory, with a single SQL query using only 248G.

| username: 胡杨树旁 | Original post link

The total data volume of this table is 1.2T. This backup is approximately 400G, and it is backed up according to the time --sql “select * from t where createdata<‘2024-01-01’”. The data that meets the condition is between 12.25 and 1.1.

| username: 胡杨树旁 | Original post link

Okay, the memory consumption of this SQL is indeed too large. Could it be related to cross-network issues? One server is in Beijing and the other is in Shanghai. The client is in the Beijing network segment, and the database is in the Shanghai network segment.

| username: 胡杨树旁 | Original post link

If you want to export in CSV format, --where can only export in SQL mode, right?

| username: Jolyne | Original post link

You can put the WHERE clause in the filtering SQL, then you can use CSV, right?

| username: 小毛毛虫 | Original post link

What is your tidb_mem_quota_query configuration set to? A SQL query of over 200GB didn’t exceed the threshold.

| username: 像风一样的男子 | Original post link

You can split this large task into several smaller tasks to execute. After all, it’s just generating CSV files and merging them at the end. Also, make sure that the fields in the WHERE condition have indexes.

| username: 胡杨树旁 | Original post link

Yes, the where condition has an index. I have exported data before, but it was in SQL mode and did not have such high memory usage.

| username: 胡杨树旁 | Original post link

According to the official documentation, it is exported in this way.

| username: 胡杨树旁 | Original post link

I’m also curious why there is no OOM.

| username: 江湖故人 | Original post link

You can exclude it locally :thinking:

| username: 江湖故人 | Original post link

Try using the -r option?

| username: h5n1 | Original post link

Try adding this: --params “tidb_enable_chunk_rpc=0”

| username: WalterWj | Original post link

No way --where should also be able to export CSV.

–where dumpling will automatically batch the SQL
–For SQL, it executes SQL to stream process data. If the writing is too slow, it may cause high memory usage, or even OOM.