The dumpling export is very slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dumpling导数很慢

| username: Kamner

【TiDB Usage Environment】Production Environment
【TiDB Version】
【Encountered Problem: Phenomenon and Impact】

Background:

Table T1 has 3.3 billion rows of data

Using dumpling to filter and export is stuck, [progress] [tables=“0/1 (0.0%)”]

dumpling -u root -p "123456" -P 4000 -h 127.0.0.1 -T gf_sta.T1\
--where "data_date >= '2024-05-21'" \
-o ${BACKUP_DIR}

Using mysqldump to export data with the same filter condition completes in a few minutes, generating a SQL file over 4GB

mysqldump -uroot -p123456 gf_sta T1 --where "data_date >= '2024-05-21'" \
> gf_sta_T1_20240524.sql

Question:

Why is dumpling stuck? Is there any parameter that needs to be added?

Solution:

Adding -r 200000 completed the export in 30 minutes.

Remaining Issue:

However, it is still not as fast as the mysqldump command which exports in 2 minutes, with over 30 million rows of data. Will this feature be optimized in the new version?

| username: xfworld | Original post link

Add parameters and try again.

Reference documentation:

| username: 像风一样的男子 | Original post link

Does the data_date field have an index?

| username: Kamner | Original post link

These two parameters are used to limit the size of the export file. My understanding of the issue here is that the fetch process is too slow, as the size of the SQL file hasn’t changed at all.

| username: Kamner | Original post link

With indexes.

| username: xfworld | Original post link

-r is not enabled by default, which means it runs in single-threaded mode…

| username: Kamner | Original post link

It ran for almost 2.5 hours, but eventually reported an error due to snapshot expiration, [total-rows=33051115], the actual number is 33994150.

| username: 这里介绍不了我 | Original post link

Try adding the -F option, the import speed will also improve.

| username: Kamner | Original post link

Thank you, master. I missed it when reading the 5.4 documentation.

After adding -r 200000, it took 30 minutes to complete the export, which is still much slower than mysqldump.

| username: xfworld | Original post link

It’s better than not being able to export at all.

This parameter has some requirements, and if those requirements can be met, it will probably be faster.

| username: Kamner | Original post link

The value of -r has been changed several times, and it always takes about 30 minutes.

| username: tony5413 | Original post link

When exporting large tables, you generally need to use the -r or -F parameter.

| username: WinterLiu | Original post link

Got it, it seems that large files still need to be split by line or size.

| username: 濱崎悟空 | Original post link

Try using the index and -r/-F parameters, which are mentioned in Dumpling.