PCTA Study Notes

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: PCTA学习笔记

| username: TiDBer_H5NdJb5Q

Course Name: Core Principles and Architecture of TiDB Database (101)

Study Duration:

500 minutes

Course Gains:

Architecture of various TiDB components

Course Content:

What kind of database do we need?

  1. Scalability

  2. Strong consistency and high availability: RPO (Recovery Point Objective) data loss amount, RTO (Recovery Time Objective) system recovery time. RPO=0, RTO sufficiently small

  3. Standard SQL and ACID transactions

  4. Cloud-native

  5. HTAP (Hybrid Transactional/Analytical Processing)

  6. Compatibility with mainstream ecosystems and protocols

Common foundational factors in the data technology stack:

  1. Data model

  2. Data storage and retrieval

  3. Data format

  4. Storage engine

  5. Replication protocol

  6. Distributed transaction model

  7. Data architecture

  8. Optimizer algorithms

  9. Execution engine

Separation of computation and storage

Hardware, especially network development, has driven the separation of computation and storage.










What kind of storage engine do we need?

  1. Finer-grained elastic scalability

  2. High concurrent read and write

  3. No data loss or errors

  4. Multiple replicas to ensure consistency and high availability

  5. Support for distributed transactions

Choosing data structures

LSM-tree: Using space to trade for write latency, replacing random writes with sequential writes

Writing is the most expensive cost

LSM-tree supports batch writing and lock-free snapshots

Data replication

Raft

How to achieve scalability

Pre-sharding (static sharding) or auto-sharding (dynamic sharding): use auto-sharding

Sharding algorithms: hash, range, list: use range

Region is a range shard

PD balances the scheduling of regions

Distributed transactions: decentralized two-phase commit

TiDB transaction isolation: SI (Snapshot Isolation), RR supports RC

Default optimistic locking

How to build a distributed SQL engine

Small table cache

Data volume not large, below 64MB

Read-only or infrequently modified

Table accessed very frequently

Lease

tidb_table_cache_lease = 5 seconds

Cannot write before lease expires, writing will block

Enabling small table cache prevents DDL operations, mutual exclusion

First write to WAL, then to memtable.

When write_buffer_size is reached, memtable will convert to immutable memtable

Immutable memtable prevents write blocking when memtable directly converts to SST

If there is one immutable memtable, it will flush to disk. If there are five without persisting to SST, it will cause write blocking (write stall) flow control

For data less than 255 bytes, directly write to the write column

Issues or Extended Thoughts During Learning:

  • Issue 1: Difference between clustered and non-clustered indexes
    Using MySQL as an example, a clustered index is an index sorted by the primary key of the table, and a table can only have one clustered index. The data storage method of a clustered index in the table is to store the entire table’s data in a B+ tree and sort it by the primary key.
    A non-clustered index is an index sorted by non-primary key fields, and a table can have multiple non-clustered indexes. The data storage method of a non-clustered index in the table is to store the value of the index field and the address of the data row in a B+ tree.
    Therefore, if querying data by non-primary key fields, you can locate the disk page of the data row that meets the conditions through the non-clustered index and then directly retrieve the data of that row, thus being relatively efficient.