Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 事务内Delete相关问题
According to the official blog, DML operations are all in the local buffer of the transaction. So is delete also targeted at the buffer? It won’t actually delete from the KV before committing. Then after deleting and selecting again, wouldn’t it read the deleted data from the KV? In actual operations, this situation does not occur. How is this resolved?
It is recommended to look into the implementation of distributed transactions and MVCC. TiDB ensures consistent reads of data.
Deleting within a transaction can be very slow. It is recommended to use batch mode for deletion.
The delete operation itself is also an insertion of a key-value pair. After performing a delete and then selecting, during the select operation, searching in TiKV will reveal that the original key has a delete record, indicating it has been deleted.
The delete you mentioned is also an insertion of kv, so is this kv inserted into the transaction buffer before commit or directly into TiKV?
Consistency ensures that deleted data will not be read.
Deletion is not a true deletion; it just marks the data as deleted. If the transaction has not yet been committed, other processes can definitely access the data. If the transaction has been committed, other processes’ transactions that were committed before the deletion transaction can read the data; otherwise, they cannot. This is due to MVCC (snapshot read) and the ACID properties of transactions. TiDB’s distributed read-write principles, timestamp, memory, and log management mechanisms. Let’s encourage each other.
When scanning kv, it will skip those with timestamps and delete markers. Your explain will have a skip execution plan.
Deleting transactions can be very slow.
Deleting within a transaction is too slow and does not immediately release space, so usually only logical deletion is performed.
In the execution plan, you will see the skip operation, which refers to the records marked for deletion. This means they are filtered out during the reading phase.
Looking forward to an article from the expert explaining it clearly.
Consistent read, marked for deletion.
When committing, logs are written in the LSM tree manner. Delete operations also write new logs, and historical data is only truly cleared during the merge process.
TiDB uses MVCC to handle concurrent reads and writes. When performing a DELETE
operation, it actually creates a new version of the row marked as deleted. Subsequent SELECT
operations will read data based on transaction visibility rules and typically will not see the deleted version.
You can take a look at this article, before commit, this kv is in the cache.
Additionally, TiDB does not support READ-UNCOMMITTED. So you don’t need to worry about other threads needing to access this part of the data in the write cache.