How is row-to-column transformation implemented in TiFlash?

translator_bot · June 23, 2024, 7:05am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiFlash的行转列是如何实现的

| username: alfred

【TiDB Usage Environment】Production Environment or Testing Environment or POC
【TiDB Version】
【Encountered Problem】
【Reproduction Path】What operations were performed that led to the problem
【Problem Phenomenon and Impact】

【Attachments】 Related logs and monitoring (https://metricstool.pingcap.com/)

For questions related to performance optimization and troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.

translator_bot · June 23, 2024, 7:05am

| username: xfworld | Original post link

The definition of metadata is different, see the image below:

The data written in real-time by TiKV is also notified to the TiFlash nodes through the Raft protocol (assuming the table information that needs to be synchronized has been set). TiFlash will process it according to the set metadata, referring to what is transmitted by Raft.

For specific reference:

The code is open source, and you can study it yourself if interested.

translator_bot · June 23, 2024, 7:05am

| username: alfred | Original post link

Okay, thank you.

translator_bot · June 23, 2024, 7:05am

| username: wish-PingCAP | Original post link

TiFlash performs row-to-column conversion during Raft synchronization.

There are two types of data synchronized via Raft:

a. Data snapshots (Snapshot), which contain data the size of an entire Region, mainly from newly added TiFlash replicas or data imported using tools like Lightning into SST.

For data coming from snapshots (such as a 96MB SST Snapshot), TiFlash performs batch row-to-column conversion to generate DTFile files, and then executes a special Ingest Snapshot process.

b. Incremental KV data, mainly from write operations.

For incremental KV data, TiFlash reorganizes it in memory into memory data units based on Blocks, and then executes the standard write process.

translator_bot · June 23, 2024, 7:05am

| username: alfred | Original post link

Under normal circumstances, it should be incremental KV data. After all, performing row-to-column conversion on a large amount of snapshot data should consume a lot of resources.

translator_bot · June 23, 2024, 7:05am

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.