Should a table with a daily increment of tens of millions of rows consider sharding?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 日增量千万的表需要考虑分表吗?

| username: TiDBer_QHSxuEa1

For a log table with a daily increase of tens of millions of records, retaining data for one month, and frequent writes due to recording user actions with relatively fewer queries, should this type of table consider sharding?

| username: 大飞哥online | Original post link

It is already partitioned, keeping one month, so the historical data can be archived.

| username: 大飞哥online | Original post link

You can create tables based on time, and archive historical months directly, which is convenient for maintenance.

| username: zhanggame1 | Original post link

TiDB does not need sharding; partitioning is sufficient. You can partition by month and periodically delete the partitions.

| username: Jellybean | Original post link

There is no need for sharding, but it is recommended to create partitioned tables and then clean up expired partition data daily. The latest version has row-level TTL, which makes it even more convenient.

We also have a similar scenario with read and write log tables, where billions of rows are inserted daily and data is retained for a year. Writes are quite frequent while reads are relatively less. We use TiDB for hot and cold data archiving, and the performance and efficiency of access are quite good.

| username: Kongdom | Original post link

It still depends on resources. If resources are sufficient, there is no need to split tables. If resources are insufficient, then tables must be split.

| username: 有猫万事足 | Original post link

No need to split the table. TiDB already separates data at the region level. However, it is recommended to use partitioning because you only keep one month’s data. Without partitioning, deleting data older than one month would be costly. Additionally, since this table has frequent writes, you need to be aware of write hotspots. Other than that, there’s nothing much to worry about.

| username: TiDBer_QHSxuEa1 | Original post link

May I ask if archiving hot and cold data means putting older historical data into a historical table?

| username: Jellybean | Original post link

No, for the user, it is still a single table.

Within the same table, we divide it into different partitions, specifying that certain historical partitions are stored on regular HDD cold disks (e.g., data from three months ago), and the latest partitions are stored on SSD hot disks (e.g., data from the last three months). This approach meets the needs for data archiving, improving data access performance, and reducing costs.

You can refer to this article we wrote earlier for more details: 专栏 - TiDB 冷热存储分离解决方案 | TiDB 社区

| username: redgame | Original post link

Partitioning is possible.

| username: YuchongXU | Original post link

The underlying layer has already been sharded by default.

| username: TiDBer_QHSxuEa1 | Original post link

Okay, thank you.

| username: cy6301567 | Original post link

It is a distributed database, so you can consider partitioning at the underlying level and then migrating and backing up cold data.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.