Learned about the development history of TiDB, which emerged in the context of distributed and big data scenarios. Understood the system architecture of TiDB, the responsibilities of each component, the data communication methods between components, and the SQL execution process.

Course Content:

What is OLTP, what is OLAP, and what is HTAP. The advantages and disadvantages of OLTP, the advantages and disadvantages of OLAP. HTAP combines the advantages of OLTP and OLAP, allowing unified data management and protocol compatibility without the need for separate deployment. TiDB system architecture, how TiDB components communicate. The working principle of MPP, SQL optimization methods, etc.
Here are my notes from the study:

Problems or Extended Thoughts Encountered During Learning:

  • Problem 1: There is a large amount of data communication between system components, such as heartbeats, region metadata, and reporting of concurrent information. How can this be handled more efficiently, and how can system performance be improved to reduce inter-system interactions?
  • Problem 2: Can data synchronization also be managed visually through TiUP to enhance convenience?
  • Extended Thought 1: After using TiDB for four years, the general perception is that TiDB’s execution efficiency is not as good as relational databases like Oracle. How can performance be further improved?

Question 1:
Look at this

Question 2:
I’m not sure if you’re asking about DM or TiCDC.
DM has an interface, but it’s not very user-friendly. When there are many tasks, it opens very slowly.

TiCDC mainly uses Grafana to monitor synchronization status.

Extended Thought 1:
Saying it’s not as good as Oracle is already a high evaluation. It’s recommended to analyze specific issues individually. If it’s a slow SQL, it’s best to provide the execution plan.

I am also learning. After reading your summary, I have gained a lot. Thank you.

The execution efficiency of TiDB is not as good as relational databases like Oracle, and the difference is quite obvious on a single machine. To make TiDB catch up with Oracle, just stack up the hardware, as distributed databases are good at horizontal scaling. In our test environment with 3 TiKV nodes, without using TiFlash for OLAP queries on the same data, the performance is basically on par with Oracle. With TiFlash, it is even faster than Oracle.

The diagram is really well done.

TiDB’s distributed application can ensure financial-grade consistency, timeliness, and security requirements, so there is no need to worry about processing efficiency. As for execution efficiency, it is affected by the number of machines and clusters, making it difficult to directly compare with other databases; additional information is needed.

After 40 years of development and refinement by users worldwide, Oracle has reached an extreme level of performance in all aspects. I believe we should consider the differences between distributed and centralized databases, or look at these databases from the perspective of their applicable scenarios. Additionally, a database is a product that requires time to mature. With further development, leading domestic databases can also become world-class products.

Recognize the gap, but do not belittle yourself.