Multi-point Databend TiDB Data Archiving Practices | Speaker: Feng Guangpu

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 多点 Databend 的 TiDB 数据归档实践 | 演讲嘉宾:冯光普

| username: Billmay表妹

TiDB Data Archiving Practice Based on Databend-V2-02

Thanks to Teacher Guangpu for bringing us a high-quality sharing @guangpu

Although it was just discovered, better late than never!

PPT Preview

TiDB Data Archiving Practice Based on Databend-V2-05
TiDB Data Archiving Practice Based on Databend-V2-06
TiDB Data Archiving Practice Based on Databend-V2-08

TiDB Data Archiving Practice Based on Databend-V2-10
TiDB Data Archiving Practice Based on Databend-V2-11
TiDB Data Archiving Practice Based on Databend-V2-12
TiDB Data Archiving Practice Based on Databend-V2-13
TiDB Data Archiving Practice Based on Databend-V2-14
TiDB Data Archiving Practice Based on Databend-V2-15

Main Content of This Sharing:

  • Why choose Databend to archive TiDB
  • Archiving tools, archiving process, practical effects
  • Summary of archiving practice and future prospects for Databend

Why Choose Databend to Archive TiDB

TiDB is used very well in multiple DMALL points. It can achieve transparent expansion, no sense of development, data increase, and unchanged architecture; TiDB gives the impression of no capacity limit, supporting more data; expansion in TiDB is more convenient, adding nodes means expansion, and automatic Rebalance. Based on the above advantages, the more TiDB is used, the more comfortable it becomes, but the cost also increases.

The above figure shows a certain DMALL TiDB cluster running on Tencent Cloud with a total of 24 nodes, each node with nearly 3 T+ space (NVME disk), now the total data volume is 60 T, and QPS is close to 100,000. The data is still increasing, so it is considered necessary to archive TiDB storage.

When choosing archiving, four aspects were compared:

  • Storage cost: the cost of object storage and HDD, SSD, where object storage is 1/10 of HDD, and 1/30 of SSD,
  • Online query capability: but the data cannot be directly placed in object storage, and it also needs to provide external query capability,
  • Support for continuous backup of large tables: this also investigated MariaDB on S3, which requires writing to InnoDB first, and then converting to a read-only s3 engine. To back up a 10 TB table, 10 TB of local space is also needed. Databend writes directly to object storage.
  • Compatibility with MySQL protocol: can maintain the usage habits of TiDB.

Based on the above conditions, DMALL finally chose Databend. The deployment architecture of DMALL Databend is as follows:

Among them, Databend-query is somewhat similar to MySQL server directly connecting to object storage. For user management, permissions, and meta information are stored in Databend-meta. This node can be deployed as a single point, and in production, it can also be deployed in cluster mode.

Archiving Tools, Archiving Process, Practical Effects

DMALL developed a new archiving tool, and the workflow is as follows:

  1. First, create the archive table structure in Databend
  2. Based on small batches, read from the source to the Channel while detecting the source pressure. If the source pressure is high, reduce the reading speed; if the pressure is low, increase the reading speed
  3. When writing to Databend, it can merge writes to achieve larger batches
  4. After successfully writing to Databend, delete from the source to ensure data safety

The archiving task is initiated by developers, reviewed by DBAs, and automated archiving with permanent data retention. Based on the above method, developers can query data from 2-3 years ago and complete it independently.

DMALL adopts single table writing in batches of 10 M each. It is estimated that a single table can archive 1.3 T of data in one day. A single cluster has no limit on tables and can write to the Databend cluster concurrently based on tasks. This way, the daily volume is quite considerable. Additionally, DMALL compared the data compression capabilities of MySQL, TiDB, and Databend:

In MySQL, it occupies 150 G of data, in TiDB it occupies 25 G, and in Databend it occupies 18 G (with only one copy, data availability is guaranteed by object storage).

Based on cost considerations: data from TiDB to Databend changes from 3 copies to 1 copy (object storage handles the copies), object storage does not need to be reserved, and is paid as used, with a unit price of only 1/10-1/30 of HDD, SSD costs. After DMALL used Databend for TiDB archiving, the storage cost was only 2% of the original storage cost.

Summary of Archiving Practice and Future Prospects for Databend

Advantages of Databend in archiving scenarios:

  • Significant cost reduction: based on object storage, cold data storage costs are reduced by 98%. If you are moving from SSD to object storage, the cost reduction can exceed 99%.
  • Cloud neutrality: supports AWS, Azure, GCP, Alibaba Cloud, Tencent Cloud, Huawei Cloud, QingCloud, Volcano Engine, minio, ceph, etc.
  • Developer-friendly: compatible with MySQL protocol, supports online queries, and has good statistical analysis performance
  • Worry-free operation and maintenance: unlimited space, high reliability, maintenance-free, and easy migration

Finally, Guangpu Feng hopes that the Databend ecosystem will be more complete, for example, more complete data synchronization from TP to Databend, and further integration of TP + AP capabilities.

Video Replay:

PPT Download

TiDB Data Archiving Practice Based on Databend-V2.pdf (1.3 MB)


| username: Aionn | Original post link

The data compression effect of TiDB is really good.

| username: TIDB-Learner | Original post link

Learn from the experts.

| username: dba远航 | Original post link


| username: jiaxin | Original post link

GOAT (Greatest of All Time)

| username: 会飞的土拨鼠 | Original post link

Respect to the expert, learning from the expert.

| username: liyuepeng123 | Original post link

Understood, please provide the Chinese text you need translated.

| username: 春风十里 | Original post link

Learn a bit.