Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: tidb一个表存多少数据,性能会明显下降?

How much data can a table in TiDB store before performance significantly degrades?
Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: tidb一个表存多少数据,性能会明显下降?
How much data can a table in TiDB store before performance significantly degrades?
If resources are sufficient, there should be no limit on the number. On my side, a single table with over 500 million rows is used normally.
I have a question. We have a large table with 4 billion records, and inserting data is relatively slow. It seems that at most we can only insert 10M of data per second. Is this normal? Or are there any good optimization solutions? I am new to TiDB and not very familiar with it yet.
How many TiKV nodes are there, what type of disks are being used, and is there a write hotspot? You can check the dashboard to see which part of the insert statement is slow.
Okay, thank you. It was deployed by the operations team. I will check it out later.
10M of data is not slow. It’s 36G per hour, nearly 1T per day. First, check the disk usage and CPU usage.
Yes, our TiDB is self-built, and all configurations are default. We are currently using 5 KV nodes with NVMe SSDs. Is there any room for optimization, such as modifying configuration files, etc.?
This depends on the machine configuration and the number of KV nodes. Essentially, the data is distributed across various KV nodes, and the most important factor is IO.
In TiKV, the size of each region is 96MB (default value), but it is best not to manage more than 4TB of data per single TiKV, as this would result in excessive heartbeats between TiKV and PD, leading to performance degradation.
It seems that I need to visit the column more often. I didn’t see such good documentation before.
The number of rows and the size of the table should both have an impact. After reading everyone’s replies and a piece of information, it mentioned 4 billion+ data, but did not specify the approximate size of the table.
We have a single table with around 100 billion rows of data, implemented with a three-data-center disaster recovery setup. The QPS is over 20,000 with P99 within 100ms. We also have a cluster with nearly 200TB of data, with QPS under 10,000 and P99 under 30ms.
There are many factors that affect performance, and data volume is just one aspect. From what I can think of in a short time, the factors affecting performance include:
Here’s another screenshot of a cluster at hand:
Is your disaster recovery implemented using TiDB’s distributed architecture or data synchronization?
For particularly important data, it should be synchronized to another cluster or stored on other heterogeneous storage. For generally important data, a three-data-center deployment is sufficient.