Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb数据库 update一条或多条语句中某个字段,没有被update的字段是否被会查询到缓存中?
In a TiDB database, when updating one or more fields in a statement, will the fields that are not being updated be queried into the cache?
For example, in the following update statement:
UPDATE
xxxxx
SET
STATUS = ?,
updatetime = ?
WHERE
form_id = ?
AND form_user_id IN (?)
AND STATUS != ? [arguments: (2, 1712562485539, 559969, 984142949, 2)];
In a MySQL database, the update statement is equivalent to first deleting and then inserting (equivalent to all fields of this row being selected into the cache). But is it the same in TiDB?
The reason for MVCC, isn’t TiDB all about inserts?
You’re right, it should be insert, but what I’m asking is whether the columns that haven’t been updated will be selected into the cache?
My personal understanding is that it will work. I roughly looked at the source code, and it seems that it first retrieves the old data, then combines the updated columns with the columns that were not updated to form a new value. This process will read the entire original value into memory.
When I have time, I can read the source code in detail. It’s probably this one: tidb/pkg/executor/update.go at 3e2f2c5873eae1f5f2e4191f0f2064260e3b86ab · pingcap/tidb (github.com) I haven’t looked at it closely yet; I’ve just recently felt the urge to read the source code.
TiDB uses RocksDB at the underlying level. Update, insert, and delete operations in RocksDB are all performed by inserting a new key-value.
Update: There is no dedicated blog, it is roughly similar to the description in this section.
The way TiDB updates data is to first retrieve the data from TiKV, then put it into the cache to update it, and finally persist the data back to TiKV. Moreover, TiDB uses the MVCC mechanism, and TiKV actually maintains multiple versions. For example, there are versions before and after the modification. For instance, during a query, if the TSO of the query is before the timestamp of the modification, it will find the data from the version before the modification. If it is after the timestamp of the modification, it will find the data from the version after the modification and will not search further. Therefore, I understand that when modifying data, the entire data row will still be fetched into the cache.
Yes, TiDB update is also an insert. For fields that you haven’t updated, you need to retrieve them from the original KV and use them when re-inserting.
The term “retrieve” should mean loading the entire value, which is an array of lists, into memory.
I personally think it will.
If it involves updating the key (the primary key of a clustered index table or the rowid of a non-clustered index table), the entire row record is first deleted (marked for insertion) and then inserted. If it does not involve updating the key, it is directly inserted.
In fact, what we more commonly use for large-scale data operations is delete. However, delete also requires reading the entire row of data into the TiDB server, mainly because it needs to synchronize the TiDB-binlog and maintain the index fields. I feel that this part can actually be optimized, but the cost is relatively high.
It should be, each key-value is complete and multiple replicas are retained.
Extract key-value pairs, update them, and then persist.
LSMTREE should operate on complete row values.