TiKV Expansion Balance Speed

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv扩容平衡速度

| username: 舞动梦灵

Dear all, after expanding TiKV in TiDB, what is the balancing speed? Has anyone tested the approximate speed?

The source server has 1TB and 9 TiKV nodes. Now, 6 more TiKV nodes are being added. Approximately how long will it take to balance to the new 6 nodes, and then shrink and delete the old 9 TiKV nodes?

The server uses Alibaba Cloud SSD hard drives.

| username: zhanggame1 | Original post link

You might as well set up a new data migration instead of adding 6 and scaling down 9. You can also upgrade to the new version in the process.

| username: kkpeter | Original post link

Your version is a bit outdated. If the cluster pressure is low, it will still be fast.

| username: 舞动梦灵 | Original post link

We can’t stop the business, need to switch to the cloud, thinking about using the scaling method to see if it works.

| username: 舞动梦灵 | Original post link

What does cluster pressure mean, low business volume? One is version 4.0.9 and the other is version 4.0.2. Sigh~~ Just joined a new company and took over. Such old versions.

| username: 像风一样的男子 | Original post link

If it’s across clouds, you need to consider the network latency and bandwidth between your data centers. TiDB should be able to handle read and write speeds of several hundred MB/s.

| username: cassblanca | Original post link

Cross-cloud refers to switching to different regions within the same provider, which should have internally optimized networks. If switching providers, such as from Alibaba Cloud to Huawei Cloud, it would likely be slower. It might be better to deploy the new version of TiDB first and then gradually migrate the data, especially if there is no business pressure.

| username: redgame | Original post link

Send the data volume and IO speed.

| username: tidb菜鸟一只 | Original post link

Switching clouds in this way makes the basic business unusable during the process, and the network latency between different clouds is quite high.

| username: 舞动梦灵 | Original post link

From Alibaba Cloud to Tencent Cloud, it is likely cheaper to use a VPN than a dedicated line. If deploying a new environment, you can use BR backup and CDC real-time synchronization. By then, you will also need to buy tens of terabytes of object storage for BR backup.

| username: 舞动梦灵 | Original post link

The data volume is 20TB, and the IO speed is currently unknown. We are using Alibaba Cloud’s SSD, and the target is Tencent Cloud’s cloud database.

| username: 舞动梦灵 | Original post link

So, are you saying that if it’s a dedicated line, would it be better?

| username: Kongdom | Original post link

You’re even spanning clouds here, so how’s the network transmission? If the bandwidth is a bit low, it might take a long time. :thinking: Is it data volume/bandwidth that determines the required time? I feel like IO shouldn’t be slower than the bandwidth.

| username: TiDB_C罗 | Original post link

How about creating a new high-version cluster in the new cloud environment and synchronizing it through third-party synchronization?

| username: dockerfile | Original post link

If the speed is fast, it will take 2-3 days. If not urgent, you can take your time.

  1. First, expand by 6, then shrink by 1.
  2. Wait for the previous machine to finish shrinking before shrinking the next one.
| username: 大飞哥online | Original post link

Whether it’s cross-cloud or different vendors, you can start by transferring some large packets to each other to check the bandwidth, or ping large packets and run it for a day to see if there is any packet loss.

| username: 大飞哥online | Original post link

During the scaling process, it will inherently have some impact on the business, especially since it involves different cloud providers. It’s concerning.

| username: 舞动梦灵 | Original post link

What impact does scaling up or down have on existing services? Does it cause slow queries or slow responses?

| username: 舞动梦灵 | Original post link

Yes, for IO, the minimum is basically SSD, which means it’s a bandwidth issue.

| username: Kongdom | Original post link

I thought of a solution: set up a new cluster on Tencent Cloud, perform backup and restore, then use TiCDC for real-time synchronization. Find a low-peak period to switch to Tencent Cloud, and then sync any missed data to Tencent Cloud. Generally, there shouldn’t be any missed data.
I wonder if this is feasible.