Analysis of TiDB OOM Situation

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb oom情况分析

| username: 大鱼海棠

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 5.4.1
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]


Why is there a 2-3 times magnification? Can any expert explain this?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

If it’s a single node, it’s easy to understand: the larger the transaction, the more memory it occupies. There might be some optimization methods that use disk for temporary storage and use memory for partial playback processing.

However, in a distributed system, there are multiple nodes, and each node is assigned a portion of the data. The data is not strictly matched according to the optimal structure, so there will definitely be amplification situations.

Refer to the official explanation:

A sketch drawn by someone else, borrowing it here:

The main point is to express the multiple coordinations needed in the two-phase commit process. Each coordination requires saving the state to ensure consistency. If the commit is successful, it is written together. If the commit fails, it is completely rolled back. This coordination process requires memory to store a lot of information.

| username: 大鱼海棠 | Original post link

I still don’t quite understand. You mentioned that a distributed system has multiple nodes, and it seems that the memory usage amplification should be at the KV layer. So, how much would it be amplified at the TiDB-server layer? In a write scenario, there will be a do read operation, so there will be memory usage for the data read out. Before the two-phase commit, there will be memory usage for the write operation, and then there is the index write (including the do read for the index). There is also the memory usage for encoding key-value pairs. Is this understanding correct? Does the do read operation read the row records?

| username: xfworld | Original post link

For example, during an update, the entire row of records needs to be read first, and at this time, the entire row of records needs to be loaded into memory. If the update is not committed, the data in memory will be modified. Once the update is committed, the data will be persisted in TiKV.

| username: 大鱼海棠 | Original post link

The memory amplification mentioned in the official documentation actually includes both TiDB and TiKV, right?

| username: xfworld | Original post link

Of course, it’s actually just a piece of data.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.