Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 请问大家的数据库最大的表有多少行数据
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]
I would like to know about everyone’s data capacity situation. We have a table with 6 billion rows. Will there be any performance issues with point queries?
No performance issues with partitioning for up to hundreds of millions of rows.
No problem, I’ve handled projects at the hundred-billion level before.
If it can handle point queries, what are you worried about?
Our largest is over 5 billion, no need to worry about point queries.
For our partitioned table, the maximum number of rows in a single table is 60 billion. Point queries are very fast and there’s no difference compared to regular tables. With TiFlash, even a full table count(*) query can be completed in about 1 minute. Overall, it’s quite good.
May I ask what is meant by partitioned table here? I just started using TiDB and don’t quite understand, would like to ask for some guidance.
Partitioned tables are a special type of table organization in TiDB. You can search the official documentation for more details; the official website explains it very clearly.
How large does a table need to be to require partitioning, or in what scenarios is it needed?
The main function of partitioning in TiDB is to facilitate data deletion. You can directly use DDL to delete that partition. Deleting large tables with ordinary DELETE is very troublesome and requires splitting the SQL to delete in segments.
TiDB’s underlying storage, through region division and scheduling, naturally supports the storage of extremely large tables and has perfectly solved the problem of sharding in traditional single-machine databases.
A table usually has over a hundred million rows, and only when there are specific business needs do we set it as a partitioned table when creating a new table. In our case, the use of partitioned tables is more for the convenience of operations and maintenance, as well as to utilize the characteristics of hot and cold storage.
- Operations and maintenance: Mainly for cleaning up historical expired data.
TiDB’s delete operation is a transactional operation. When a large amount of data (over a million rows) needs to be deleted at once, it requires executing a super-large transaction, which can take hours or even fail due to the transaction being too large. By using the partitioned table feature, we can place expired data (by day, week, month, etc.) in a partition, which can be deleted very quickly (in seconds), achieving the purpose of data deletion. Hours vs. seconds, the choice is obvious.
- Using the partition feature, certain partitions can be placed on regular HDDs, while important partitions can be placed on NVME SSDs, allowing administrators to flexibly control data storage and reduce storage costs.
There are many other application scenarios for partitioned tables, and the specific design and use depend on the characteristics of the business itself.
Thank you very much for your detailed answer.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.