Some Questions Regarding Data Update Issues in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB涉及数据更新问题得一些疑问

| username: TiDBer_JlY1JCJ5

Is TiFlash in TiDB sensitive to updates, meaning will frequent updates cause issues? Traditionally, columnar databases are not friendly to updates, so it’s best not to perform frequent updates when using columnar databases. Does TiDB have any optimizations for frequent updates? Also, I’m not sure about the performance of RocksDB for update operations. Could you please provide some insights? Thank you.

| username: zhanggame1 | Original post link

TiFlash handles updates the same way as TiKV. In TiKV, regions are divided into leader and follower roles. TiFlash acts as a learner role, similar to a follower, but it does not participate in voting.

| username: WalterWj | Original post link

When TiDB updates data, TiFlash performs an append operation instead of a modification.

| username: TiDBer_JlY1JCJ5 | Original post link

If updates are very frequent, won’t append cause performance issues on TiFlash?

| username: WalterWj | Original post link

If the TiFlash hardware is not bad, there shouldn’t be a big problem. This is because TiFlash synchronizes at the region level. If there are no hotspots, you can consider the synchronization between TiKV and TiFlash to be many-to-many.

Of course, it is best to test the synchronization efficiency, as the upper limit of this synchronization efficiency is quite high.

| username: xingzhenxiang | Original post link

Isn’t a TiKV update also an insertion of new data?

| username: andone | Original post link

TiFlash uses append operations. Additionally, data replication between TiKV and TiFlash is asynchronous.

| username: tidb菜鸟一只 | Original post link

The actual update in TiKV is to insert new data and mark the old data as a historical version, while TiFlash asynchronously replicates TiKV’s regions through the Learner protocol, which has relatively high performance.

| username: TiDBer_G2SHLw9o | Original post link

Recently, I’ve also been researching TiDB. Generally, columnar storage is heavy on updates. From reading the TiDB documentation, I understand that TiFlash operates similarly, meaning it performs append operations followed by merges. TiDB refers to this as GC, which by default runs every 10 minutes to perform a merge. However, I’m not sure about the performance of this process, as I haven’t found any information on whether frequent updates would affect the write and read performance of TiKV and TiFlash. I look forward to a detailed explanation from experts. How does the benchmark perform? The official benchmark has a chart, but there is no report on TiFlash performance.

| username: 托马斯滑板鞋 | Original post link

It does not affect the performance of TiKV; regions on TiFlash asynchronously transmit logs in learner mode but ensure eventual data consistency: for example, at a certain point in time, TiFlash has synchronized 3 logs and there are 7 remaining. TiFlash will wait for all logs to be synchronized before returning the result. If you want performance, enable fastscan.

| username: dba远航 | Original post link

TiFlash uses columnar storage, and the incoming data is compressed. The mechanism is somewhat different from regular databases, and the impact on performance is not very noticeable.