Course Notes: Getting Started with TiDB (Part 2)

translator_bot · June 23, 2024, 12:10pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 课程笔记 TiDB 快速起步（下）

| username: 云散月明

Course Link

TiDB Quick Start

Course Outline

History of Database, Big Data, and TiDB Development
- 01: History and Trends of Database and Big Data Development
- 02: Development of Distributed Relational Databases
- 03: Evolution of TiDB Products and Open Source Community
Overview of TiDB
- 04: What Kind of Database Do We Really Need
- 05: How to Build a Distributed Storage System
- 06: How to Build a Distributed SQL Engine
Selection of Next-Generation HTAP Database
- 07: HTAP Database Based on Distributed Architecture
- 08: Key Technological Innovations of TiDB
- 09: Typical Application Scenarios and User Cases of TiDB
First Experience with TiDB
- 10: First Experience with TiDB

Course Notes (Part 2)

OLTP: Pursues high concurrency and low latency
OLAP: Pursues throughput
TiDB used for data middle platform
1. Massive storage allows aggregation of multiple data sources, real-time data synchronization
2. Supports standard SQL, quick results from multi-table joins
3. Transparent multi-business modules, supports task dimension queries after table aggregation
4. TiDB’s maximum pushdown mechanism and parallel hash join operators
Introduce Spark (only provides heavyweight queries with low concurrency) to alleviate the computational power issue of the data middle platform
Column storage is naturally friendly to OLAP queries
TiFlash connects to multi-raft groups as a raft learner, using asynchronous data transmission, imposing very little burden on TiKV
Introduce MPP to address computational power mismatch
Replace computing resources with network and storage costs
Next steps in HTAP exploration
1. Unified data services
2. Iteration of embedded product functions, completed by specific products for HTAP
3. Integration of multiple technology stacks and products, forming HTAP services through data linkage
Stages of Data Warehouse
1. Batch processing (ETL) offline data warehouse
2. Lambda architecture combining batch and stream processing
3. Kappa architecture focusing on stream processing
Distributed KV storage system
Distributed SQL computing system
Distributed HTAP architecture system
Automatic sharding technology as the basis for finer-grained elasticity
Elastic sharding builds a dynamic system
1. 96MB auto-increment sharding
2. 20MB merged sharding
More discrete replication groups based on multi-raft
Linear write scalability based on multi-raft
Cross-IDC multi-node writes for single tables based on multi-raft
Decentralized distributed transactions
Local read and geo-partition
TP and AP integration under larger data capacity
Unified data services: TiDB’s CBO can collect row and column cost models for configuration
Typical scenarios
1. OLTP Scale High scalability online (high concurrency, large data volume, high availability)
2. Real-time HTAP
Sharding, database splitting, middleware Proxy
Performance degradation and B-tree due to oversized tables
TiUP is a cluster operation and maintenance tool introduced in TiDB 4.0
TiUP’s playground component is used to deploy local clusters