In the join process, if the data volume of the right table is very large, will TiDB encounter insufficient memory issues?

translator_bot · June 23, 2024, 2:59am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Join过程中，如果右表数据量很大，TiDB是否会出现内存不够的情况？

| username: Sailor

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Encountered Issues]
[Reproduction Path] What operations were performed that led to the issue
[Issue Phenomenon and Impact]

[Attachments]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

translator_bot · June 23, 2024, 2:59am

| username: h5n1 | Original post link

Large table joins can easily cause OOM (Out of Memory); the specific situation needs to be examined in detail.

translator_bot · June 23, 2024, 2:59am

| username: Sailor | Original post link

Thank you for your reply. Can TiDB handle large tables with disk-based processing, or is it limited to in-memory processing?

translator_bot · June 23, 2024, 2:59am

| username: h5n1 | Original post link

There are operators that write to disk, and each version has differences. Optimization is ongoing.

translator_bot · June 23, 2024, 2:59am

| username: Sailor | Original post link

Is there a strategy for spilling operators to disk? I would like to know at what buffer size the system will start using disk-based processing. Also, can the buffer size be adjusted?

translator_bot · June 23, 2024, 2:59am

| username: h5n1 | Original post link

This is not specifically explained, mainly relying on mem quota query to limit the memory used by SQL.

translator_bot · June 23, 2024, 2:59am

| username: Sailor | Original post link

Okay, thank you.

translator_bot · June 23, 2024, 2:59am

| username: 特雷西-迈克-格雷迪 | Original post link

You also need to enable a parameter for disk spill.

translator_bot · June 23, 2024, 2:59am

| username: 人如其名 | Original post link

The main consideration for spilling to disk is which operators support it. Currently, most operators support spilling to disk. For joins, it’s important to note that index NL join does not support spilling to disk. If the right table has many duplicate records, it can easily consume a lot of memory, and multiple nested joins can easily lead to OOM (Out of Memory).

translator_bot · June 23, 2024, 2:59am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.