[Discussion Post] When is it necessary to scale up or down?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【交流贴】什么情况下需要扩容缩容?

| username: Billmay表妹

In the community, it is often found that there are many issues related to scaling up and scaling down.

Question 1:
When do we need to scale up?

Question 2:
When do we need to scale down?

Question 3:
Why do we need to scale up before scaling down?

| username: ohammer | Original post link

  1. Scaling up or down involves both computing and storage resources. Depending on the business scenario, such as a large event where concurrency is much higher than usual, you can consider adding TiDB server nodes and scaling down after the event ends.

  2. If storage space or IO performance is insufficient, consider scaling up TiKV.

| username: Jellybean | Original post link

From a technical implementation perspective, each component of TiDB supports scaling up or down as needed. Here, from TiDB’s point of view, I’ll explain it in a simple and informal way, just to convey the idea.

  1. When is scaling up needed?
    To give an inappropriate analogy for better understanding, imagine you have a small knife and you need to slaughter a cow; you would need a bigger knife. This process of changing to a bigger knife is scaling up.
    When business traffic increases (such as during a Double 11 event), for example, if daily traffic increases from 10,000 to 1 million, the database might not be able to handle it (assuming there are no issues on the application side). Users might experience slow and laggy access, so to improve processing capacity, you need to add more instances (add machines, memory, disks, etc.). This process of adding resources is called scaling up. Specifically, which component to scale up depends on whether the bottleneck is in computation (TiDB), storage (TiKV), or scheduling (PD), which the administrator needs to determine in advance.

  2. When is scaling down needed?
    Using a sledgehammer to crack a nut.
    For example, if a TiDB cluster can handle 1 million requests per day, but the current traffic is very low, only 10,000 per day, you would reclaim the excess resources (instances, machines, disks, memory). This process of reclaiming resources can be understood as scaling down.

  3. Why scale up first and then scale down?
    Scaling up and scaling down are two independent operations. Scaling up does not necessarily mean you have to scale down later; it depends on the traffic or future business volume.
    In some scenarios, if you anticipate a long-term increase in business volume, the resources added during scaling up will continue to be used, and there will be no need to scale down.
    In other scenarios, such as temporary scaling up to handle an event, once the event is over, the temporarily added resources need to be reclaimed, which is when scaling down is necessary.
    In summary, there is no inherent sequence between the two operations. Whether to scale up or down is entirely determined by the business needs.

| username: Billmay表妹 | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.