In a TiDB database, when updating one or more statements for a specific field, will the fields that are not updated be queried from the cache?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb数据库 update一条或多条语句中某个字段,没有被update的字段是否被会查询到缓存中?

| username: yulei7633

In a TiDB database, when updating one or more fields in a statement, will the fields that are not being updated be queried into the cache?
For example, in the following update statement:

  STATUS = ?,
  updatetime = ?
  form_id = ?
  AND form_user_id IN (?)
  AND STATUS != ? [arguments: (2, 1712562485539, 559969, 984142949, 2)];
| username: yulei7633 | Original post link

In a MySQL database, the update statement is equivalent to first deleting and then inserting (equivalent to all fields of this row being selected into the cache). But is it the same in TiDB?

| username: TiDBer_QYr0vohO | Original post link

The reason for MVCC, isn’t TiDB all about inserts?

| username: yulei7633 | Original post link

You’re right, it should be insert, but what I’m asking is whether the columns that haven’t been updated will be selected into the cache?

| username: 裤衩儿飞上天 | Original post link

My personal understanding is that it will work. I roughly looked at the source code, and it seems that it first retrieves the old data, then combines the updated columns with the columns that were not updated to form a new value. This process will read the entire original value into memory. :crazy_face: When I have time, I can read the source code in detail. It’s probably this one: tidb/pkg/executor/update.go at 3e2f2c5873eae1f5f2e4191f0f2064260e3b86ab · pingcap/tidb ( I haven’t looked at it closely yet; I’ve just recently felt the urge to read the source code.

| username: zhanggame1 | Original post link

TiDB uses RocksDB at the underlying level. Update, insert, and delete operations in RocksDB are all performed by inserting a new key-value.

| username: yiduoyunQ | Original post link

Update: There is no dedicated blog, it is roughly similar to the description in this section.

| username: zhaokede | Original post link


| username: miya | Original post link

The way TiDB updates data is to first retrieve the data from TiKV, then put it into the cache to update it, and finally persist the data back to TiKV. Moreover, TiDB uses the MVCC mechanism, and TiKV actually maintains multiple versions. For example, there are versions before and after the modification. For instance, during a query, if the TSO of the query is before the timestamp of the modification, it will find the data from the version before the modification. If it is after the timestamp of the modification, it will find the data from the version after the modification and will not search further. Therefore, I understand that when modifying data, the entire data row will still be fetched into the cache.

| username: tidb菜鸟一只 | Original post link

Yes, TiDB update is also an insert. For fields that you haven’t updated, you need to retrieve them from the original KV and use them when re-inserting.

| username: TiDBer_QYr0vohO | Original post link

The term “retrieve” should mean loading the entire value, which is an array of lists, into memory.

| username: TIDB-Learner | Original post link

I personally think it will.

| username: xiaoqiao | Original post link

Studied the principles.

| username: Hacker_PtIIxHC1 | Original post link

Learned~ :+1:

| username: 人如其名 | Original post link

If it involves updating the key (the primary key of a clustered index table or the rowid of a non-clustered index table), the entire row record is first deleted (marked for insertion) and then inserted. If it does not involve updating the key, it is directly inserted.

In fact, what we more commonly use for large-scale data operations is delete. However, delete also requires reading the entire row of data into the TiDB server, mainly because it needs to synchronize the TiDB-binlog and maintain the index fields. I feel that this part can actually be optimized, but the cost is relatively high.

| username: virvle | Original post link

It should be, each key-value is complete and multiple replicas are retained.

| username: DBAER | Original post link


| username: zhang_2023 | Original post link

Learned :+1:

| username: 友利奈绪 | Original post link

Extract key-value pairs, update them, and then persist.

| username: dba远航 | Original post link

LSMTREE should operate on complete row values.