Course Notes: Getting Started with TiDB (Part 2)

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 课程笔记 TiDB 快速起步(下)

| username: 云散月明

Course Link

TiDB Quick Start

Course Outline

  • History of Database, Big Data, and TiDB Development

    • 01: History and Trends of Database and Big Data Development

    • 02: Development of Distributed Relational Databases

    • 03: Evolution of TiDB Products and Open Source Community

  • Overview of TiDB

    • 04: What Kind of Database Do We Really Need

    • 05: How to Build a Distributed Storage System

    • 06: How to Build a Distributed SQL Engine

  • Selection of Next-Generation HTAP Database

    • 07: HTAP Database Based on Distributed Architecture

    • 08: Key Technological Innovations of TiDB

    • 09: Typical Application Scenarios and User Cases of TiDB

  • First Experience with TiDB

    • 10: First Experience with TiDB

Course Notes (Part 2)

  1. OLTP: Pursues high concurrency and low latency

  2. OLAP: Pursues throughput

  3. TiDB used for data middle platform

    1. Massive storage allows aggregation of multiple data sources, real-time data synchronization

    2. Supports standard SQL, quick results from multi-table joins

    3. Transparent multi-business modules, supports task dimension queries after table aggregation

    4. TiDB’s maximum pushdown mechanism and parallel hash join operators

  4. Introduce Spark (only provides heavyweight queries with low concurrency) to alleviate the computational power issue of the data middle platform

  5. Column storage is naturally friendly to OLAP queries

  6. TiFlash connects to multi-raft groups as a raft learner, using asynchronous data transmission, imposing very little burden on TiKV

  7. Introduce MPP to address computational power mismatch

  8. Replace computing resources with network and storage costs

  9. Next steps in HTAP exploration

    1. Unified data services

    2. Iteration of embedded product functions, completed by specific products for HTAP

    3. Integration of multiple technology stacks and products, forming HTAP services through data linkage

  10. Stages of Data Warehouse

    1. Batch processing (ETL) offline data warehouse

    2. Lambda architecture combining batch and stream processing

    3. Kappa architecture focusing on stream processing

  11. Distributed KV storage system

  12. Distributed SQL computing system

  13. Distributed HTAP architecture system

  14. Automatic sharding technology as the basis for finer-grained elasticity

  15. Elastic sharding builds a dynamic system

    1. 96MB auto-increment sharding

    2. 20MB merged sharding

  16. More discrete replication groups based on multi-raft

  17. Linear write scalability based on multi-raft

  18. Cross-IDC multi-node writes for single tables based on multi-raft

  19. Decentralized distributed transactions

  20. Local read and geo-partition

  21. TP and AP integration under larger data capacity

  22. Unified data services: TiDB’s CBO can collect row and column cost models for configuration

  23. Typical scenarios

    1. OLTP Scale High scalability online (high concurrency, large data volume, high availability)

    2. Real-time HTAP

  24. Sharding, database splitting, middleware Proxy

  25. Performance degradation and B-tree due to oversized tables

  26. TiUP is a cluster operation and maintenance tool introduced in TiDB 4.0

  27. TiUP’s playground component is used to deploy local clusters