Using TiDB to Verify Whether Value Investing is Truly Feasible

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 用tidb验证价值投资是否真的可行

| username: tidb狂热爱好者

[TiDB Usage Environment] Production Environment / Testing / POC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]
Method as follows:
Store the minute-level data of US stocks into TiDB

Calculate the ROE of each stock
Classify stock returns into three categories: ROE greater than 1, ROE greater than 0, ROE less than 0
Then calculate the average return for each category to see if stocks with higher ROE have higher average returns

| username: TiDBer_小阿飞 | Original post link

The database is the underlying layer, the foundation. There are also middleware, application layer, network layer, etc. on top of it. It seems to have a little bit to do with what you’re talking about, but it also seems unrelated. If it’s about computing power and storage capacity, HTAP and TiDB with separated computing and storage should be able to meet your needs.

| username: tidb狂热爱好者 | Original post link

The minute data of each stock every day is a large amount of data. The data volume of each electricity meter in Zhejiang Electric Power cannot be stored in one database, generating 50TB of data in a day. Data is ubiquitous in our lives. The KFC you often eat has 100 PG databases. When PG crashed for a day back then, the entire cashier system couldn’t operate, resulting in significant losses.

When collaborating with Genshin Impact, users went to buy KFC, and the cashier system crashed again. So, do you think TiDB should be used? Recently, Wang Jun has been helping KFC recruit people. If you know TiDB, you can give it a try. A monthly salary of over ten thousand is not a dream. They might offer you thirty to fifty thousand, and you can happily maintain the TiDB database.

| username: tidb狂热爱好者 | Original post link

In fact, most programs around us are called database applications. Just one database and one application. Many companies use distributed frameworks and system interactions to show off their prowess, with a bunch of Kafka and Redis instances. This brings about a series of issues such as middleware, application layers, and network layers, leading to the development of technologies like ARM, link monitoring, and link tracing. However, they still can’t handle sudden traffic spikes. When peak traffic hits, the business crashes.

So, is it better for the business to have just two layers: the database and the application? Taobao’s monolithic application runs just fine. At most, you might need Kafka to queue data writes if the database can’t handle it. If you use TiDB, you might not even need Kafka, allowing continuous traffic to hit the database directly, avoiding the risk of middleware like Redis crashing and causing the database to fail.

This is what TiDB CTO Huang Dongxu mentioned in his article about continuous traffic pressure.

DynamoDB observed that one of the root causes of cascading failures is sudden traffic spikes. One common factor leading to traffic spikes is cache failures. Although we generally believe that the higher the cache hit rate, the better (papers might mention a cache hit rate of about 99.75% for partition routers), such a high cache hit rate means that when the cache fails (or during the cache warm-up phase when many new nodes join), the metadata service must handle a 400-fold traffic surge (in the worst case, from 0.25% to 100%). DynamoDB addresses this issue in the following ways:

  1. Adding a level of distributed memory cache MemDS between the request router and the metadata service. When the request router’s local cache is missed, it doesn’t directly access the metadata service but first accesses MemDS, which then accesses the metadata service in the background to fill the data. Adding a layer of cache for peak shaving is a common approach, equivalent to adding another layer of insurance.
  2. The second method is very clever. It mentions that the request router actually fetches metadata through MemDS. When the request doesn’t hit the cache in MemDS, it’s easy to understand. But the clever part is: even if the cache hits, MemDS will asynchronously access the metadata service. The reasons are: 1. It ensures that the existing cache in MemDS is updated as soon as possible. 2. It brings “stable” traffic to the metadata service (although it might be larger).

For example, “stable” but larger traffic is like playing in water, so when a flood comes, you can have good confidence. :slight_smile:

The reasons why most applications are database applications are as follows:

  1. Data Storage and Management: Applications usually need to store and manage large amounts of data, such as user information, business data, transaction records, etc. Databases provide an efficient way to organize, store, and manage this data.
  2. Data Consistency and Integrity: Databases ensure data consistency and integrity, preventing data loss, duplication, or inconsistency.
  3. Efficient Data Retrieval: Databases can quickly search, filter, and extract the required data, improving application performance and response speed.
  4. Data Sharing and Collaboration: Multiple users or components can simultaneously access and operate on the data in the database, promoting team collaboration and information sharing.
  5. Data Security: Databases can set user permissions, encryption, and other security measures to protect data security and privacy.
  6. Data Analysis and Reporting: Databases facilitate the generation of various reports and analyses, helping decision-makers make informed decisions.
  7. Reliability and Fault Tolerance: Databases usually have backup and recovery functions to prevent data loss or damage.
  8. Scalability: Databases can be expanded and upgraded as the business grows and demands change.
  9. Standardization and Normalization: Following certain database design principles and standards helps improve data quality and maintainability.
  10. Industry Standards and Mature Technology: Database technology has been developed and validated over a long period, making it a mature and reliable technology.

Most applications around us do not involve algorithms; they just compute and store data in the database. In simple terms, it’s just CRUD (Create, Read, Update, Delete).

So why do databases still often crash?

  1. Lack of Experience or Knowledge: Developers may not fully understand the features and best practices of databases.
  2. Incorrect Design or Implementation: The database architecture may be poorly designed, or there may be logical errors in the code.
  3. Insufficient Performance Optimization: Lack of proper performance optimization can lead to high database load.
  4. High Concurrent Access: Poor handling of high concurrent database access.
  5. Improper Exception Handling: Inadequate handling of exceptions can lead to database crashes.
  6. Lack of Testing: Insufficient testing of the code may leave potential issues undiscovered.
  7. Poor Resource Management: For example, not properly releasing database connections and other resources.
  8. Database Configuration Issues: Inappropriate database configuration may not meet application needs.

To avoid these issues, the following measures can be taken:

  1. Strengthen developer training and knowledge accumulation.
  2. Conduct reasonable database design and code implementation.
  3. Focus on performance optimization and concurrent processing.
  4. Properly handle exceptions.
  5. Conduct thorough testing.
  6. Manage resources reasonably.
  7. Optimize database configuration.

Dongxu’s original blog is as follows:
Some notes on DynamoDB 2022 paper - Recently, all Chinese content has been deleted, so it needs to be backed up.

| username: Kongdom | Original post link

:thinking: Doesn’t this require setting up a data center?

| username: TiDBer_JUi6UvZm | Original post link


| username: tidb狂热爱好者 | Original post link

A data center is essential; there are tens of millions at stake.

| username: dba远航 | Original post link

A database is just for storing data; how you present various data depends on the performance of your front-end application.

| username: TiDBer_小阿飞 | Original post link

Awesome class

| username: jiayou64 | Original post link

Awesome! :cow:

| username: 呢莫不爱吃鱼 | Original post link

:+1: :+1:

| username: zhang_2023 | Original post link


| username: 洪七表哥 | Original post link

This has nothing to do with whether or not TiDB is used.

| username: zhaokede | Original post link

:+1: Learning

| username: Jack-li | Original post link

Awesome, learned something new.