TiDB Hot and Cold Data Storage and Archiving

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 冷热数据存储以及归档

| username: TiDBer_kmB0dl9H

[TiDB Usage Environment] Production Environment / Testing / PoC
[Encountered Issues: Problem Phenomenon and Impact]
Scenario: Storing historical data for the message center, requiring hot and cold data separation to reduce resources. For example, using TiDB to store data from the past three months, how should the cold data from the past two years be stored? Is table partitioning necessary? How should cold backup archiving be done?
Do any experts have any good solutions?

| username: zhanggame1 | Original post link

Column - TiDB Hot and Cold Storage Separation Solution | TiDB Community

| username: TiDBer_QYr0vohO | Original post link

This can be achieved with Placement Rules in SQL.

| username: 林夕一指 | Original post link

The method of Placement Rules in SQL can achieve the final separation of hot and cold data. However, since it does not guarantee the separation of hot and cold data during the process, sometimes hot data may end up in cold storage, causing the business to be aware of it. This needs to be evaluated independently. This was my conclusion when using version 6.5.0, and I am not sure if there have been any optimizations in subsequent versions after 6.5.

| username: 小龙虾爱大龙虾 | Original post link

If you want a cold backup, just export it, then delete it from the database.

| username: YuchongXU | Original post link

It can only be handled manually on a regular basis.

| username: zhaokede | Original post link

In terms of business, cold data can be transferred to another database.

| username: TiDBer_HUfcQIJx | Original post link

Scheduled tasks older than 3 months are migrated to the historical database.

| username: Kamner | Original post link

Recently, I’ve been working on data archiving as well. Here’s how I do it:

  1. Use BR to perform a full backup of the tables that need to be archived;
  2. Use dumping to filter and export the previous day’s data daily. You need a date field for this, and you can also export by month or quarter depending on your needs;
  3. Use TTL to automatically filter data. For example, if you need to keep 3 months of online database data, you can set TTL=4 months, adding an extra month for redundancy queries;
    使用 TTL (Time to Live) 定期删除过期数据 | PingCAP 文档中心
  • The files exported by dumpling can be compressed and stored on cheaper storage like NAS.
  • I tested the export speed of dumpling and mysqldump. With the same filter conditions, exporting a 4GB target SQL file takes about 30 minutes with dumpling and only 2 minutes with mysqldump. Therefore, I chose mysqldump as the tool for daily exports.
| username: Jellybean | Original post link

The original poster can refer to this solution, which stores 300TB of data. In the same TiDB cluster, the hot data from the last 3 months is stored on NVME drives, while the cold data older than 3 months is stored on regular HDDs. It has been running for several years.