Dear experts, if you have deployed 100 sets of TiDB, how do you manage TiDB at this time?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 各位大佬 如果tidb你部署了100套 这个时候你们是怎么管理tidb的

| username: tidb狂热爱好者

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

For example, in our company, the testing environment, dev environment, stg, and prod are all isolated. Management tools are deployed separately. In this case, each environment has a central controller, and the central control machine has also been deleted by someone before. Recovery is very troublesome. How do you manage this, experts?

| username: 数据小黑 | Original post link

Once the S3-based architecture is introduced, won’t this problem no longer exist?

| username: dba-kit | Original post link

TiUP can manage multiple clusters. In theory, each environment only needs one control machine with TiUP installed. However, it’s crucial to ensure proper backups of the control machine. I recommend using the ZFS file system on the machine where TiUP is installed and deploying ZFS-auto-snapshot to take regular snapshots. Since the file changes on this machine should be minimal, you can retain snapshots for 30 days or even longer.

| username: vcdog | Original post link

In this scenario, you might need to find a relevant open-source management platform for secondary development. Some time ago, I saw a DBA from Zhuanzhuan sharing about managing multiple clusters:

| username: dba-kit | Original post link

Especially in the dev environment, it naturally matches with ZFS. Our dev environment’s dozens of MySQL databases all use the ZFS system. When DQ/developers accidentally delete data, it only takes a few minutes to roll back and get it up and running again.

| username: BraveChen | Original post link

What time?

| username: 我是咖啡哥 | Original post link

If there are really that many, automation must be considered. Manual work is very tiring and prone to errors.

| username: xfworld | Original post link

With 100 instances, we must use K8S…

| username: Kongdom | Original post link

I can’t even imagine 100 sets~

| username: tidb菜鸟一只 | Original post link

Use k8s with operator.

| username: ffeenn | Original post link

KubeSphere Take a Look KubeSphere Deploys TiDB Cloud-Native Distributed Database - Juejin (

| username: 啦啦啦啦啦 | Original post link


| username: tidb狂热爱好者 | Original post link

Thank you to all the experts for providing the methods.

| username: tidb狂热爱好者 | Original post link

My personal research shows that a test cluster can be tested using k8s, which is very convenient.

| username: Running | Original post link

What does 100 mean? TiDB itself can elastically scale. What kind of scenario requires 100 instances?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.