Application of TiDB Database in a Provincial Maternal and Child Health Business System

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB数据库在某省妇幼业务系统应用

| username: 我是人间不清醒

Business Background

The construction content of a provincial maternal and child health management system includes: maternal and child health information database; maternal and child health information collection system, maternal and child health information management and analysis system; mother and child health management APP, maternal and child health management and analysis APP, and other 62 functional modules.

Original Database Architecture

Original Technical Architecture and Pain Points

We chose Starrocks as the analysis layer and implemented real-time + offline data synchronization through the DataX + CloudCanal model. As the business iterates, this architecture is not very suitable for the development needs of maternal and child business.

The architecture is generally divided into four parts from bottom to top:

  • Data Layer: The source data sources are mainly MySQL relational databases. Five sets (10 units) of master-slave setups are built to undertake 14 sub-databases and 1 main database, with over 700 tables in the MySQL sub-databases.
  • Processing Layer: Using CloudCanal + DataX for real-time and offline data synchronization.

Offline: Constructing DM theme data marts for reports, large screens, and data exchange services in an offline manner using the DataX tool.

Real-time: Using CloudCanal to synchronize MySQL data 1:1 to Starrocks.

  • Analysis Layer: The analysis layer will store calculated indicator data and intermediate result data for accelerated queries.

  • Using three 32C128G machines to build an SR cluster, corresponding to report business, large screen business, data exchange services, and query acceleration.

  • Pain Points:

  • Frequent business changes: Initially, to meet business needs, database structure changes were made at least twice a week. After negotiating with the business unit, it was controlled to once a week. Changes require DBAs to adjust the 14 city MySQL databases, Starrocks, and data scheduling.

  • Server resource waste: MySQL uses a master-slave architecture for data security, and the resources of the slave database are not effectively utilized.

  • Impact of business updates on business: The application layer uses a microservice architecture, involving many developers who only focus on their own business modules. They submit DDL statements through the SQL audit platform, which are finally executed by the DBA. The DBA is not familiar with the business and cannot check whether the DDL statements meet the actual business needs. This causes frequent anomalies the day after the platform is released (some issues that cannot be reproduced in the test environment).

  • Inability to improve SQL optimization skills of R&D and DBA personnel: Since MySQL data is synchronized 1:1 to Starrocks, all complex queries are replaced by MPP. SQL optimization and reasonable table splitting are no longer focused on (previously thought to be a good thing to improve R&D efficiency). This problem will erupt uniformly after the MPP bottleneck, and can only be solved by upgrading server configurations, not fundamentally solving the problem.

  • DBAs feel overwhelmed: This architecture is a bit exhausting. At a frequency of once a week, DBAs perform DDL operations after 6 PM during business off-peak hours. DBAs need to maintain 30 sets of databases, nearly 20,000 tables. After the operation, they release the program and then follow up with testing. If problems occur, rolling back operations at midnight greatly impacts the business.

  • Database segmentation by city is not well compatible with cross-city business services, such as reports that can only be displayed after summarizing databases through wide tables (increasing number of wide tables) and duplicate records, inability to query service records across cities, etc.

  • Unable to apply automated database audit platform, complex decentralized database operations, high difficulty in automation implementation.

Based on the above pain points, we are looking for a better technical architecture…

Database Consolidation

Total Tables Over 100,000 Tables Over 1,000,000 Tables Over 10,000,000 Tables
792 156 58 5

Looking for a database solution to ensure write and Online-DDL operations after database consolidation.

Tested Polardb, TiDB, OceanBase, and finally chose TiDB

TiDB Features

  • Highly compatible with MySQL, in most cases, code can be easily migrated from MySQL to TiDB without modification. Even MySQL clusters with sharding can be migrated in real-time using TiDB’s migration tools.
  • Horizontal scalability, easily achieve horizontal expansion by adding new nodes, expand throughput or storage as needed, and easily handle high concurrency and massive data scenarios.
  • Distributed transactions, TiDB 100% supports standard ACID transactions.
  • Truly financial-grade high availability, compared to traditional master-slave (M-S) replication schemes, the majority election protocol based on Raft can provide financial-grade 100% data strong consistency guarantee, and can achieve automatic failover without losing most replicas, without manual intervention.

TiDB Database Architecture Introduction

Technical Architecture

New Database Architecture

The new architecture removes the Starrocks analysis layer, leveraging TiDB’s TiFlash component OLAP functionality. Data exchange uses CloudCanal, merging the analysis layer and business layer, with business as wide tables, uniformly accelerated through TiFlash. This brings significant improvements in operational costs and business expansion.

Introducing Yearning Automated SQL Audit Platform

Yearning has the following features:

  • Automated SQL statement review, can automatically detect and execute SQL
  • Automatically generate rollback statements after executing DDL/DML statements
  • Review/query audit functions
  • Supports LDAP login/DingTalk and email message push
  • Supports custom review workflows
  • Supports fine-grained permission allocation

“Subsequent special training will explain the database audit platform”

About SQL Optimization

DBAs record, track, and continuously optimize slow statements.

Benefits After Analysis Layer Transformation

After replacing Starrocks with TiFlash, the overall query efficiency (compared to Starrocks with city-based sharding) did not significantly improve. After SQL optimization, it basically meets expectations.

Comparison of common queries between TiDB and Starrocks (same data volume and structure)

Query Response Time Comparison (limited by editor SQL pasted images)

Built-in Monitoring TiDB Dashboard

QPS

SQL Statement Analysis

Slow Query

Traffic Visualization

Database/Table Deletion Recovery

Examples

  • Restore the test database deleted by DROP: DROP DATABASE test; FLASHBACK DATABASE test;
  • Restore the test database deleted by DROP and rename it to test1: DROP DATABASE test; FLASHBACK DATABASE test TO test1;

–Restore table deletion example

  • Restore the table data deleted by DROP: DROP TABLE t; FLASHBACK TABLE t;
  • Restore the table data truncated by TRUNCATE. Since the truncated table still exists, the restored table needs to be renamed, otherwise, an error will be reported that table t already exists. TRUNCATE TABLE t; FLASHBACK TABLE t TO t1;

Backup/Restore Exceeds Expectations

TiDB provides BR cluster snapshot backup function, directly backing up logs to MinIO. Currently, the provincial maternal and child health system takes snapshots twice a day at 0:00 and 12:00. Due to storage limitations, only snapshots within one day can be retained, and log backups are not done. (Full snapshot + real-time log backup) can ensure no data loss. BR restores data beyond expectations, restoring 300G data in 20 minutes (v7.1.3). The latest official version v7.6 improves BR restore capability by 10 times.

Trial of One Location Two Centers (Banking Business: At Least Two Locations Three Centers)

Considering the importance of maternal and child data, a one-location two-center setup is implemented on the government cloud. Through TiCDC, the main cluster writes data to the secondary cluster in real-time, while the secondary cluster handles report business and R&D test database environment.

pt-kill application captures running SELECT|ALTER and other DML/DDL queries consuming excessive resources, filters them, then kills them (optional) and sends email/WeChat alerts to DBAs and relevant developers to avoid prolonged slow SQL execution causing harm to the database.

https://github.com/hcymysql/pt-kill

Background operation

shell> nohup php pt-kill.php -u username -p pwd -h 10.10.159.0 -P 3306 -B 10 -I 15 --match-info=‘select|alter’ --match-user=‘dbuser’ --kill --mail --weixin --daemon 1 &

Killed slow statements will be retained in the background log file, and the next step is to optimize the SQL.

New Architecture Effect Explanation

Reasonable Utilization of Server Resources

Comparison Data Layer Processing Layer Analysis Layer
Original Architecture 10 units 2 units 3 units
New Architecture 11 units 1 unit 0 units

Release of Human Resources

In the original architecture, DBA personnel performed DDL operations and adjustments and scheduling on the database according to the release cycle in the data layer and processing layer. The new architecture eliminates the maintenance work of scheduling and introduces an SQL audit platform to achieve automated DDL. However, DBAs need to pay more attention to various indicators of TiDB.

Reduction of Operational Costs

TiDB deployment does not require the support of big data components, making deployment and operation very simple. TiDB is compatible with the MySQL ecosystem, and business use can directly connect using MySQL JDBC, without worrying about SQL syntax differences. The original MySQL master-slave architecture, SR cluster, and scheduling task maintenance work can all be eliminated.

Future Plans

Currently, we have two data architectures, MySQL + Starrocks and TiDB. Each architecture has its advantages (they can also be used in combination). Based on the needs of the business unit, we will determine which architecture to use according to the requirements of different business lines.

| username: Billmay表妹 | Original post link

Thank you for sharing.

| username: TiDBer_aaO4sU46 | Original post link

Learned.

| username: 庐州晓月 | Original post link

Thumbs up

| username: TiDBer_jYQINSnf | Original post link

Very detailed, thumbs up :+1:

| username: Kongdom | Original post link

Thank you for sharing.

| username: Jellybean | Original post link

Great share :+1:

| username: jiayou64 | Original post link

Learned.

| username: mono | Original post link

The data volume is indeed not large. I have a MySQL database with over 1,500 tables. There are more than 200 million tables, with 4 of them having over 10 million records each. I’m still using MySQL to handle it. However, some large queries from backend services have become unbearable, so I migrated to TiDB.

| username: heiwandou | Original post link

Thumbs up

| username: wfxxh | Original post link

Thumbs up

| username: DBAER | Original post link

Not bad, basic gameplay.

| username: 考试没答案 | Original post link

Great share! Thumbs up!

| username: onlyacat | Original post link

Thumbs up

| username: cassblanca | Original post link

Thumbs up! Data is priceless. Even though the data volume is only in the tens of millions, they still implemented same-city disaster recovery. Worthy of praise.

| username: Kongdom | Original post link

Thumbs up

| username: 烂番薯0 | Original post link

Impressive, impressive.

| username: TiDBer_5cwU0ltE | Original post link

How long did the two systems run in parallel before the switch?

| username: WalterWj | Original post link

I recommend posting articles in the column.