At what point does the performance of a TiDB table significantly degrade due to the amount of data stored?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb一个表存多少数据,性能会明显下降?

| username: TiDBer_y9IRzLWc

How much data can a table in TiDB store before performance significantly degrades?

| username: Kongdom | Original post link

If resources are sufficient, there should be no limit on the number. On my side, a single table with over 500 million rows is used normally.

| username: TiDBer_y9IRzLWc | Original post link

I have a question. We have a large table with 4 billion records, and inserting data is relatively slow. It seems that at most we can only insert 10M of data per second. Is this normal? Or are there any good optimization solutions? I am new to TiDB and not very familiar with it yet.

| username: tidb菜鸟一只 | Original post link

How many TiKV nodes are there, what type of disks are being used, and is there a write hotspot? You can check the dashboard to see which part of the insert statement is slow.

| username: TiDBer_y9IRzLWc | Original post link

Okay, thank you. It was deployed by the operations team. I will check it out later.

| username: 像风一样的男子 | Original post link

Could you please share the dashboard monitoring?

| username: Kongdom | Original post link

Check if the IO in Grafana monitoring is fully utilized~

| username: zhanggame1 | Original post link

10M of data is not slow. It’s 36G per hour, nearly 1T per day. First, check the disk usage and CPU usage.

| username: TiDBer_y9IRzLWc | Original post link

Yes, our TiDB is self-built, and all configurations are default. We are currently using 5 KV nodes with NVMe SSDs. Is there any room for optimization, such as modifying configuration files, etc.?

| username: cy6301567 | Original post link

This depends on the machine configuration and the number of KV nodes. Essentially, the data is distributed across various KV nodes, and the most important factor is IO.

| username: redgame | Original post link

This is difficult to test.

| username: breakyang | Original post link

In TiKV, the size of each region is 96MB (default value), but it is best not to manage more than 4TB of data per single TiKV, as this would result in excessive heartbeats between TiKV and PD, leading to performance degradation.

| username: ShawnYan | Original post link

Please refer to

| username: Kongdom | Original post link

It seems that I need to visit the column more often. I didn’t see such good documentation before.

| username: TiDBer_y9IRzLWc | Original post link

Sure, thank you very much.

| username: cy6301567 | Original post link

Good

| username: zhaokede | Original post link

The number of rows and the size of the table should both have an impact. After reading everyone’s replies and a piece of information, it mentioned 4 billion+ data, but did not specify the approximate size of the table.

| username: realcp1018 | Original post link

We have a single table with around 100 billion rows of data, implemented with a three-data-center disaster recovery setup. The QPS is over 20,000 with P99 within 100ms. We also have a cluster with nearly 200TB of data, with QPS under 10,000 and P99 under 30ms.

There are many factors that affect performance, and data volume is just one aspect. From what I can think of in a short time, the factors affecting performance include:

  1. Server performance: disk IOPS, bandwidth, network bandwidth, CPU, memory size, etc.
  2. Number of records in the table, length of a single record
  3. Access method to the table, whether it’s point queries or large range queries, read-write ratio, QPS, and whether the indexes are reasonable and complete
  4. Whether there is cross-AZ communication between cluster nodes, and the network latency between AZs
  5. Whether the Store data volume exceeds the preset ratio, causing scheduling to occupy too many resources, etc.


This table previously maintained over 150 billion rows of data, later cleaned up over 100 billion, leaving more than 20 billion.

Here’s another screenshot of a cluster at hand:

| username: zhanggame1 | Original post link

Is your disaster recovery implemented using TiDB’s distributed architecture or data synchronization?

| username: realcp1018 | Original post link

For particularly important data, it should be synchronized to another cluster or stored on other heterogeneous storage. For generally important data, a three-data-center deployment is sufficient.