Dumpling causes TiDB node OOM

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dumpling导致tidb节点oom

| username: China-Dwan

[TiDB Usage Environment] Production Environment / Test / Poc
Production Environment
[TiDB Version]
5.4.0
[Reproduction Path] What operations were performed when the issue occurred
./dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 1 -o /data/bak -r 20000 -F256MiB --filter “test.*” --tidb-mem-quota-query 2147483648 --params “tidb_distsql_scan_concurrency=1”
[Encountered Issue: Issue Phenomenon and Impact]
When using dumpling to export SQL files, the TiDB node experienced an OOM (Out of Memory) issue.
[Resource Configuration]
TiDB node specification: 4c16g, exporting data: 70g, multiple tables
[Attachments: Screenshots/Logs/Monitoring]

| username: China-Dwan | Original post link

The data volume of each table in the database is not consistent, some are a few hundred MB, and some are over ten GB.

| username: 小龙虾爱大龙虾 | Original post link

-T option, try reducing concurrency.

| username: China-Dwan | Original post link

-t has already been specified as 1. -T is for specifying the table.

| username: China-Dwan | Original post link

The document is described as follows

| username: China-Dwan | Original post link

The purpose of this data migration is to upgrade the version from v5.4 to v6.5.5. To ensure cluster security, a new high-version cluster is set up, and data is migrated to complete the cluster upgrade. I tested using dumpling version v5.4.0 and dumpling version v6.5.5, both of which have this issue.

| username: h5n1 | Original post link

Check the logs before TiDB OOM.

| username: 小龙虾爱大龙虾 | Original post link

Well, it seems that -t specifies concurrency. Check the slow queries in the dashboard to see which SQL caused the OOM.

| username: China-Dwan | Original post link

It is possible to locate the specific slowlog, and the information retained during the server OOM snapshot has also been found in the /tmp directory. However, there is no issue when executing the SQL alone. The specific SQL is as follows:
select * from test WHERE id >= 2330606 AND id < 2734876 ORDER BY id;

| username: China-Dwan | Original post link

The image is not visible. Please provide the text you need translated.

| username: China-Dwan | Original post link

I think the problem might be related to the configuration of the network. You can try to check the network settings and see if there are any issues.

| username: China-Dwan | Original post link

The image is not visible. Please provide the text you need translated.

| username: h5n1 | Original post link

Check the heap at that time to see what is consuming the most memory.

| username: China-Dwan | Original post link

What tool do I need to use to view this?

| username: h5n1 | Original post link

Could you please upload the file?

| username: China-Dwan | Original post link

heap2023-11-28T15:38:21+08:00 (278.7 KB)

| username: China-Dwan | Original post link

The main reason is that the default value of tidb_distsql_scan_concurrency is 15, which is relatively high. When the number of concurrent queries is large, it will cause a large number of threads to be created, resulting in high CPU usage. You can try to reduce the value of tidb_distsql_scan_concurrency to alleviate this problem.

| username: China-Dwan | Original post link

Is it possible that dumpling triggered some bugs in the tidb-server, causing a memory leak?

| username: xingzhenxiang | Original post link

You can try using dumpling v7.1.2 or higher. I exported 18TB of data without any crashes.

| username: h5n1 | Original post link

–params “tidb_enable_chunk_rpc=0” Try using this.