Course Notes: Getting Started with TiDB (Intermediate)

translator_bot · June 23, 2024, 12:10pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 课程笔记 TiDB 快速起步（中）

| username: 云散月明

Course Link

TiDB Quick Start

Course Outline

History of Database, Big Data, and TiDB Development
- 01: History and Trends of Database and Big Data Development
- 02: Development of Distributed Relational Databases
- 03: Evolution of TiDB Products and Open Source Community
Overview of TiDB
- 04: What Kind of Database Do We Really Need
- 05: How to Build a Distributed Storage System
- 06: How to Build a Distributed SQL Engine
Selection of Next-Generation HTAP Database
- 07: HTAP Database Based on Distributed Architecture
- 08: Key Technological Innovations of TiDB
- 09: Typical Application Scenarios and User Cases of TiDB
First Experience with TiDB
- 10: First Experience with TiDB

Course Notes (in progress)

Storage Engine
1. Finer-grained Elastic Scaling
2. High Concurrent Read and Write
3. No Data Loss or Errors
4. Multi-replica Ensures Consistency and High Availability
5. Supports Distributed Transactions
Data Structure is the Core Fundamental Technology of Databases
1. BTree and LSM-tree
  1. The LSM-tree structure essentially uses space to trade off write latency, replacing random writes with sequential writes.
  2. TiKV single node uses the rocksdb engine based on LSM-tree
Data Replication
1. Consensus Algorithms: raft, paxos
2. Implementation of Scaling: Pre-sharding (static), Auto-sharding (dynamic)
3. Sharding Algorithms: hash, range, list
  1. Range Sharding
    1. More Efficient Data Scanning
    2. Simple Implementation of Automatic Splitting and Merging
    3. Elastic Priority, Shards Can Be Automatically Scheduled
    4. May Encounter Hot Shard Issues
TiKV
1. A Multi-Raft System, Data is Split by Region (default 96M)
2. Each Region is a Key Range, from StartKey to EndKey, left-closed and right-open interval.
3. Data Storage/Access/Replication/Scheduling is done by region
4. Multi-version Control: TiKV’s MVCC is implemented by adding a version number to the Key
5. Coprocessor is the module in TiKV that reads data and performs calculations, each TiKV storage node has a coordinator calculator
Distributed Transaction Model
1. Decentralized Two-Phase Commit
2. Google Percolator Transaction Model
3. TiKV Supports Full Transaction KV API
4. Default Optimistic Transaction Model
5. Default Isolation Level: Snapshot Isolation
Implementing Logical Tables on KV, Secondary Index Based on KV
Cost-based Optimizer
Main Optimization Strategy of Distributed SQL Engine: Push Down
Key Operators Distributed
Online DDL Algorithm
1. No Sharding Concept in TiDB
2. The DDL process is divided into several states such as public, delete-only, write-only, etc., each state is synchronized and consistent across multiple nodes, eventually completing the full DDL
TiDB-server is a peer-to-peer, stateless, horizontally scalable, multi-point writable entry that directly handles user SQL
Other Functions of TiDB-server
1. Front-end Functions
  1. Connection and Account Permission Management
  2. MYSQL Protocol Encoding and Decoding
  3. Independent SQL Execution
  4. Database Table Metadata and System Variables
2. Back-end Functions
  1. GC
  2. Execute DDL
  3. Statistics Management
  4. SQL Optimizer and Executor

Course Notes: Getting Started with TiDB (Intermediate)

Course Link

Course Outline

Course Notes (in progress)

Reference Materials