Tidb-lightning OOM

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-lightning oom

| username: 田帅萌7

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]

TiDB 7.1.5 encountered an error when using tidb-lightning to import data in load mode.
[root@bjx-152-30-42 tidb]# head -n 30 nohup.out
Verbose debug logs will be written to tidb-lightning-5.log

±–±---------------------------------------------±------------±-------+
| # | CHECK ITEM | TYPE | PASSED |
±–±---------------------------------------------±------------±-------+
| 1 | Source csv files size is proper | performance | true |
±–±---------------------------------------------±------------±-------+
| 2 | the checkpoints are valid | critical | true |
±–±---------------------------------------------±------------±-------+
| 3 | table schemas are valid | critical | true |
±–±---------------------------------------------±------------±-------+
| 4 | all importing tables on the target are empty | critical | true |
±–±---------------------------------------------±------------±-------+
| 5 | Cluster version check passed | critical | true |
±–±---------------------------------------------±------------±-------+
| 6 | Lightning has the correct storage permission | critical | true |
±–±---------------------------------------------±------------±-------+
| 7 | no CDC or PiTR task found | critical | true |
±–±---------------------------------------------±------------±-------+

fatal error: out of memory allocating heap arena metadata

runtime stack:
runtime.throw({0x5258650?, 0x0?})
/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7f1dea221ca0 sp=0x7f1dea221c70 pc=0x18f363d
runtime.(*mheap).sysAlloc(0x8569d80, 0xf7ffffffff?, 0x8579f18, 0x1)
/usr/local/go/src/runtime/malloc.go:725 +0x357 fp=0x7f1dea221d38 sp=0x7f1dea221ca0 pc=0x18c5617
runtime.(*mheap).grow(0x8569d80, 0x5e0?)
/usr/local/go/src/runtime/mheap.go:1472 +0x7f fp=0x7f1dea221db0 sp=0x7f1dea221d38 pc=0x18e2f1f
runtime.(*mheap).allocSpan(0x8569d80, 0x5e0, 0x0, 0xe3?)

nohup.out (54.8 MB)

145G /data/sorted-kv-dir-5/

| username: 田帅萌7 | Original post link

The image you provided is not accessible. Please provide the text content for translation.

| username: h5n1 | Original post link

I also experienced OOM with version 7.1.1 before. After switching to 7.5.1, the cluster was also upgraded to 7.5.1.

| username: Billmay表妹 | Original post link

Is this issue resolved after version 7.5.1?

| username: h5n1 | Original post link

My problem is solved.

| username: 田帅萌7 | Original post link

Current solution:

Lowering region-concurrency, but memory consumption is still high.
Later, splitting the table for import resulted in stable memory usage.

| username: 田帅萌7 | Original post link

However, it does not fundamentally solve the problem. If the table is large and the data volume is high, the memory consumption should still be high.

| username: 田帅萌7 | Original post link

If the official team has optimized it, for example, if version 7.5.1 has resolved this issue, you can also mention it.

| username: TiDBer_QYr0vohO | Original post link

It seems that version 7.5.1 does have optimizations for Lightning.