TiDB Quick Start Study Notes Part 1

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB快速起步学习笔记一

| username: TiDBer_9R9sHgXn


History of Distributed Systems


In 2006, Google introduced the three pillars of big data:

* GFS  -- Solved the distributed file system problem
* Google BigTable  - Solved the distributed key-value problem
* Google MapReduce  - Solved how to perform distributed computing and analysis on distributed file systems and distributed KV storage.

Main challenges of distributed computing

* How to maximize the implementation of divide and conquer
* How to achieve global consistency
* How to perform fault tolerance and partial failure tolerance
* How to deal with unreliable networks and network partitions

Famous CAP theorem in distributed systems

  • Consistency - Replica consistency
  • Availability
  • Partition Tolerance
  • CA, CP, AP

Transactions: ACID

Two different consistencies: the former describes replica consistency, the latter describes transaction consistency.

RPO: Recovery Point Objective, mainly refers to the amount of data loss that the business system can tolerate.

RTO: Recovery Time Objective, mainly refers to the maximum time that the business can tolerate being out of service.


TiDB Highly Layered Architecture


Elasticity is the core consideration of the entire architecture design. TiDB is mainly divided into three layers logically:

 * Technical engine supporting standard SQL - TIDB Server 
 * Distributed storage engine - TiKV 
  • Metadata management and scheduling engine - Placement Driver (PD)
    • Cluster metadata management, including shard distribution, topology, etc.
    • Distributed transaction ID allocation
    • Scheduling center

Core of the database: Data structure

TiKV single node chose the RocksDB engine based on LSM-tree:

  • RocksDB is a very mature LSM-tree storage engine
  • Supports atomic batch write
  • Lock-free snapshot read (Snapshot)
    • This feature plays a role in data replica migration.




TiKV system adopts range data sharding algorithm.

Distributed transaction model

  • Decentralized two-phase commit
    • Global timestamping through PD (TSO)
    • –4M timestamps per second
    • Each TiKV node allocates a separate area to store lock information (CF Lock)
  • Google Percolator transaction model
  • TiKV supports complete transaction KV API
  • Default optimistic transaction model
    • Also supports pessimistic transaction model (version 3.0+)
  • Default isolation level: Snapshot Isolation

SQL relational model

Implementing logical tables on KV:


Using Spark to alleviate the computational power problem of the data middle platform: can only provide low-concurrency heavyweight queries.

Physical isolation is the best resource isolation

Column storage is naturally friendly to OLAP queries - Tiflash

Row-column data synchronization - Raft-based best solution

MPP engine - parallel computing

The TiDB database has already implemented the following features in HTAP technology:

  1. Columnar storage has achieved real-time write capability
  2. MPP solves node scalability and parallel computing
  3. Enables Spark to run on TiKV.
| username: TiDBer_21wZg5fm | Original post link

Why does it look like a screenshot of lecture slides?

| username: TIDB-Learner | Original post link

Screenshots lack soul.

| username: miya | Original post link

Screenshots lack soul.