In a TiDB database, when updating one or more statements for a specific field, will the fields that are not updated be queried from the cache?

translator_bot · June 20, 2024, 9:10pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb数据库 update一条或多条语句中某个字段，没有被update的字段是否被会查询到缓存中？

| username: yulei7633

In a TiDB database, when updating one or more fields in a statement, will the fields that are not being updated be queried into the cache?
For example, in the following update statement:

UPDATE
  xxxxx
SET
  STATUS = ?,
  updatetime = ?
WHERE
  form_id = ?
  AND form_user_id IN (?)
  AND STATUS != ? [arguments: (2, 1712562485539, 559969, 984142949, 2)];

translator_bot · June 20, 2024, 9:10pm

| username: yulei7633 | Original post link

In a MySQL database, the update statement is equivalent to first deleting and then inserting (equivalent to all fields of this row being selected into the cache). But is it the same in TiDB?

translator_bot · June 20, 2024, 9:10pm

| username: TiDBer_QYr0vohO | Original post link

The reason for MVCC, isn’t TiDB all about inserts?

translator_bot · June 20, 2024, 9:10pm

| username: yulei7633 | Original post link

You’re right, it should be insert, but what I’m asking is whether the columns that haven’t been updated will be selected into the cache?

translator_bot · June 20, 2024, 9:10pm

| username: 裤衩儿飞上天 | Original post link

My personal understanding is that it will work. I roughly looked at the source code, and it seems that it first retrieves the old data, then combines the updated columns with the columns that were not updated to form a new value. This process will read the entire original value into memory. When I have time, I can read the source code in detail. It’s probably this one: tidb/pkg/executor/update.go at 3e2f2c5873eae1f5f2e4191f0f2064260e3b86ab · pingcap/tidb (github.com) I haven’t looked at it closely yet; I’ve just recently felt the urge to read the source code.

translator_bot · June 20, 2024, 9:10pm

| username: zhanggame1 | Original post link

TiDB uses RocksDB at the underlying level. Update, insert, and delete operations in RocksDB are all performed by inserting a new key-value.

translator_bot · June 20, 2024, 9:10pm

| username: yiduoyunQ | Original post link

Update: There is no dedicated blog, it is roughly similar to the description in this section.

translator_bot · June 20, 2024, 9:10pm

| username: zhaokede | Original post link

Learned.

translator_bot · June 20, 2024, 9:10pm

| username: miya | Original post link

The way TiDB updates data is to first retrieve the data from TiKV, then put it into the cache to update it, and finally persist the data back to TiKV. Moreover, TiDB uses the MVCC mechanism, and TiKV actually maintains multiple versions. For example, there are versions before and after the modification. For instance, during a query, if the TSO of the query is before the timestamp of the modification, it will find the data from the version before the modification. If it is after the timestamp of the modification, it will find the data from the version after the modification and will not search further. Therefore, I understand that when modifying data, the entire data row will still be fetched into the cache.

translator_bot · June 20, 2024, 9:10pm

| username: tidb菜鸟一只 | Original post link

Yes, TiDB update is also an insert. For fields that you haven’t updated, you need to retrieve them from the original KV and use them when re-inserting.

translator_bot · June 20, 2024, 9:10pm

| username: TiDBer_QYr0vohO | Original post link

The term “retrieve” should mean loading the entire value, which is an array of lists, into memory.

translator_bot · June 20, 2024, 9:10pm

| username: TIDB-Learner | Original post link

I personally think it will.

translator_bot · June 20, 2024, 9:10pm

| username: xiaoqiao | Original post link

Studied the principles.

translator_bot · June 20, 2024, 9:10pm

| username: Hacker_PtIIxHC1 | Original post link

Learned~

translator_bot · June 20, 2024, 9:10pm

| username: 人如其名 | Original post link

If it involves updating the key (the primary key of a clustered index table or the rowid of a non-clustered index table), the entire row record is first deleted (marked for insertion) and then inserted. If it does not involve updating the key, it is directly inserted.

In fact, what we more commonly use for large-scale data operations is delete. However, delete also requires reading the entire row of data into the TiDB server, mainly because it needs to synchronize the TiDB-binlog and maintain the index fields. I feel that this part can actually be optimized, but the cost is relatively high.

translator_bot · June 20, 2024, 9:10pm

| username: virvle | Original post link

It should be, each key-value is complete and multiple replicas are retained.

translator_bot · June 20, 2024, 9:10pm

| username: DBAER | Original post link

Learned.

translator_bot · June 20, 2024, 9:10pm

| username: zhang_2023 | Original post link

Learned

translator_bot · June 20, 2024, 9:10pm

| username: 友利奈绪 | Original post link

Extract key-value pairs, update them, and then persist.

translator_bot · June 20, 2024, 9:10pm

| username: dba远航 | Original post link

LSMTREE should operate on complete row values.