What is the optimal data volume for achieving the best cost-performance ratio in TiDB?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 数据量满足多少的情况下TiDB的收益和支出性价比最高

| username: 友利奈绪

At what data volume does TiDB offer the best cost-performance ratio in terms of benefits and expenses? Thank you for your suggestions!

| username: TiDBer_jYQINSnf | Original post link

Without specific data, a general rule of thumb is: if MySQL can’t handle it anymore and you’re out of options, then use TiDB. If MySQL is working well for you, there’s no need for TiDB.

| username: 这里介绍不了我 | Original post link

Personally, I think it depends on the scale of the business. If MySQL reaches a bottleneck that cannot be resolved, you can try switching to TiDB.

| username: redgame | Original post link

We are above 3T.

| username: TiDBer_aaO4sU46 | Original post link

The leader says go, so we go.

| username: changpeng75 | Original post link

The amount of data is certainly important, but it also depends on the composition of the data, the application, and other specific circumstances.

| username: TIDB-Learner | Original post link

The data volume of a single point is too large, and I don’t want to split the database and tables, with too many analytical transactions.

| username: tidb狂热爱好者 | Original post link

For TiDB, you should start using it when you have around 200GB.

| username: xfworld | Original post link

An improved solution is needed when using MySQL in your scenario becomes very painful and the operational pressure is high. Of course, if the business scenario has many clear performance-related metrics, you can compare MySQL and TiDB by doing some POCs to reference the final benefits and costs.

| username: DBAER | Original post link

Considering concurrency, hardware, and feature requirements.

| username: Thedeep | Original post link

Generally, two indicators are considered:

  1. The number of rows in a single table. The industry reference value for TP systems using MySQL databases is 20 million rows per table. If this limit is reached, you either need to split the database or consider using another database, such as TiDB.
  2. Data volume, with a general reference value of 1TB to 2TB. At this scale, using TiDB offers significant benefits.

Currently, TiDB 7.1 introduces Resource Control, which allows multiple small databases to be deployed together. This approach facilitates maintenance while implementing resource control, ensuring that some databases still have high availability and scalability. Following this logic, you don’t actually need to consider the number of rows and data volume anymore.

| username: 小于同学 | Original post link

I think if MySQL requires sharding, then it’s time to consider using TiDB.

| username: tidb菜鸟一只 | Original post link

Sharding is absolute garbage, anyone who uses it is an idiot. If your MySQL can’t handle the load and needs sharding, just switch to TiDB. Messing around with sharding will only turn you into an idiot.

| username: TiDBer_jYQINSnf | Original post link

Be open to new things, but don’t be too quick to abandon the old ones :crazy_face:. Before TiDB, sharding solved many problems, and many large businesses still don’t fully trust TiDB and continue to use sharding. Although sharding has many limitations, it has made significant contributions. Transactions may not be great, but storing data in MySQL gives many veteran operations a sense of security.

| username: yiduoyunQ | Original post link

My personal understanding can be divided into several parts:

  1. When the total data volume exceeds the single machine limit, you can either choose a sharding solution or a distributed solution.
  2. When the hot data volume exceeds the single machine memory limit, such as in random read/write operations, the performance of a single machine will rapidly decline when involving a large number of IO operations.
  3. For HTAP with high real-time requirements, you can search for related content yourself.
| username: paulli | Original post link

If the scenario involves a combination of online business and analytical business, you can consider using TiDB. TiKV provides efficient point get primary key queries, and TiFlash offers complex analytical queries for large volumes of data.

| username: 小龙虾爱大龙虾 | Original post link

With higher version resource control, small internal systems in the enterprise can be stacked on top, which feels quite suitable.

| username: Jellybean | Original post link

In OLTP scenarios with single tables exceeding 20 million rows.

By leveraging resource control features, multiple businesses can share a single cluster to achieve multi-tenant isolation. This approach already covers areas with small data scales. Unless you are pursuing extreme stability, it is also recommended to place small data scale scenarios in the TiDB cluster.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.