How much data can a single table in TiDB support?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB单表能够支持多大数据量?

| username: Atlan

Is there a significant difference in query latency between a single table and sharded tables under the minimum deployment architecture of TiDB when the data volume is the same? Or, what is the boundary between sharding and not sharding?

| username: TiDBer_jYQINSnf | Original post link

Reading a piece of data on the KV side is always a kvget request. The only difference between different tables when organizing keys is the prefix, which has no impact on performance. One advantage of partitioning tables is that deleting a partitioned table is simpler than deleting a batch of data from a large table, and it has less impact on performance.

| username: tidb菜鸟一只 | Original post link

There’s no need to split tables in TiDB anymore; you can partition them. If there are OLTP requests, you can add TiFlash.

| username: dba-kit | Original post link

Apart from scenarios requiring hot and cold data separation or archiving, partition tables are generally not used. There’s even less need for sharding. If sharding is acceptable, then MySQL can be used instead, making TiDB unnecessary.

| username: 胡杨树旁 | Original post link

TiDB itself already partitions the table, right? If it is scattered, it should be distributed across different regions and different nodes.

| username: Jellybean | Original post link

TiDB clusters are not recommended for sharding tables. The primary motivation for its creation was to address the issue of sharding databases and tables in MySQL. It is recommended to use large tables directly for storage. Some companies have tables with trillions of rows, and single tables can reach hundreds of terabytes. For point queries, the difference is minimal; for range scans, it depends on the amount of data you are scanning.

As for whether to shard tables or not, it should be considered based on business and operational management situations. Typically, storing tens of billions of rows in a single table is a very common scenario.

| username: Jellybean | Original post link

When the data volume is the same for a single table and sharded tables, there might be some differences in query latency. However, the specific difference depends on factors such as data distribution, query conditions, index usage, etc. Under normal conditions, this latency difference might not be very significant.

In practical scenarios, the boundary for sharding depends on various factors, such as table size, query load, hardware configuration, and so on.

| username: xingzhenxiang | Original post link

The largest single table I have experienced so far has 12 billion rows.

| username: zhanggame1 | Original post link

TiDB is designed to avoid sharding; when there is a large amount of data, use partitioned tables.

| username: 胡杨树旁 | Original post link

Could you please tell me if there are any best practices for setting parameters when making DDL changes?

| username: tony5413 | Original post link

The official documentation does not impose a limit on a single table. I think it depends on the data type of the single table, hardware performance, TiDB version, etc.

| username: xingzhenxiang | Original post link

DDL is very fast, the index is directly thrown to the background.

| username: forever | Original post link

Now all MySQL databases are being merged into TiDB.

| username: zhanggame1 | Original post link

Is there a need for partitioning if the table data volume is very large?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.