Course Notes: Getting Started with TiDB (Intermediate)

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 课程笔记 TiDB 快速起步(中)

| username: 云散月明

Course Link

TiDB Quick Start

Course Outline

  • History of Database, Big Data, and TiDB Development

    • 01: History and Trends of Database and Big Data Development

    • 02: Development of Distributed Relational Databases

    • 03: Evolution of TiDB Products and Open Source Community

  • Overview of TiDB

    • 04: What Kind of Database Do We Really Need

    • 05: How to Build a Distributed Storage System

    • 06: How to Build a Distributed SQL Engine

  • Selection of Next-Generation HTAP Database

    • 07: HTAP Database Based on Distributed Architecture

    • 08: Key Technological Innovations of TiDB

    • 09: Typical Application Scenarios and User Cases of TiDB

  • First Experience with TiDB

    • 10: First Experience with TiDB

Course Notes (in progress)

  1. Storage Engine

    1. Finer-grained Elastic Scaling

    2. High Concurrent Read and Write

    3. No Data Loss or Errors

    4. Multi-replica Ensures Consistency and High Availability

    5. Supports Distributed Transactions

  2. Data Structure is the Core Fundamental Technology of Databases

    1. BTree and LSM-tree

      1. The LSM-tree structure essentially uses space to trade off write latency, replacing random writes with sequential writes.

      2. TiKV single node uses the rocksdb engine based on LSM-tree

  3. Data Replication

    1. Consensus Algorithms: raft, paxos

    2. Implementation of Scaling: Pre-sharding (static), Auto-sharding (dynamic)

    3. Sharding Algorithms: hash, range, list

      1. Range Sharding

        1. More Efficient Data Scanning

        2. Simple Implementation of Automatic Splitting and Merging

        3. Elastic Priority, Shards Can Be Automatically Scheduled

        4. May Encounter Hot Shard Issues

  4. TiKV

    1. A Multi-Raft System, Data is Split by Region (default 96M)

    2. Each Region is a Key Range, from StartKey to EndKey, left-closed and right-open interval.

    3. Data Storage/Access/Replication/Scheduling is done by region

    4. Multi-version Control: TiKV’s MVCC is implemented by adding a version number to the Key

    5. Coprocessor is the module in TiKV that reads data and performs calculations, each TiKV storage node has a coordinator calculator

  5. Distributed Transaction Model

    1. Decentralized Two-Phase Commit

    2. Google Percolator Transaction Model

    3. TiKV Supports Full Transaction KV API

    4. Default Optimistic Transaction Model

    5. Default Isolation Level: Snapshot Isolation

  6. Implementing Logical Tables on KV, Secondary Index Based on KV

  7. Cost-based Optimizer

  8. Main Optimization Strategy of Distributed SQL Engine: Push Down

  9. Key Operators Distributed

  10. Online DDL Algorithm

    1. No Sharding Concept in TiDB

    2. The DDL process is divided into several states such as public, delete-only, write-only, etc., each state is synchronized and consistent across multiple nodes, eventually completing the full DDL

  11. TiDB-server is a peer-to-peer, stateless, horizontally scalable, multi-point writable entry that directly handles user SQL

  12. Other Functions of TiDB-server

    1. Front-end Functions

      1. Connection and Account Permission Management

      2. MYSQL Protocol Encoding and Decoding

      3. Independent SQL Execution

      4. Database Table Metadata and System Variables

    2. Back-end Functions

      1. GC

      2. Execute DDL

      3. Statistics Management

      4. SQL Optimizer and Executor

Reference Materials