Is it necessary to merge sharded databases and tables from RDB to TiDB?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: RDB的分库分表到TIDB有必要合并起来么?

| username: vincentLi

[TiDB Usage Environment] Production Environment
[TiDB Version]
The ORACLE database split an account business table into 1024 sub-tables for better write performance. Now, when migrating to TiDB, is it necessary to merge them back into a single table? The benefit of doing so is that the program logic might be simplified, but will there be any performance issues?

| username: 小龙虾爱大龙虾 | Original post link

Do a test.

| username: srstack | Original post link

TiDB exists to eliminate the need for sharding and partitioning~ You can test it out.

| username: jiayou64 | Original post link

TiDB itself is distributed, so there’s no need for sharding. Specifically, you should set up an environment for testing.

| username: FutureDB | Original post link

Indeed, many people use TiDB to avoid sharding and partitioning.

| username: Jack-li | Original post link

Practice yields true knowledge.

| username: lemonade010 | Original post link

Our database, which was migrated from RDS, has sharding and partitioning. We migrated it directly to TiDB without merging, and the performance is quite good.

| username: 濱崎悟空 | Original post link

Not changing the original reasonable design may not be a bad thing; altering the structure for the target database also incurs significant costs.

| username: zhaokede | Original post link

Performance needs to be tested with your own business; it’s difficult for outsiders to evaluate accurately.

| username: zhaokede | Original post link

It is best to use business code for stress testing.

| username: vincentLi | Original post link

None of the answers from the experts are feasible, haha. We are currently in the feasibility assessment stage, which means we need to research whether there are actual examples to refer to and compare, and whether there is a need for modification, as well as the pros and cons. While we are still assessing feasibility, you are asking me to overhaul the system and stress test it before deciding whether to take action. The cost is too high, isn’t it?

| username: zhh_912 | Original post link

It needs to be judged based on the business situation.

| username: TiDBer_ZxWlj6A1 | Original post link

Shouldn’t this be made into a partitioned table in Oracle? In TiDB, it seems there’s no difference between making it one table or multiple tables. Both are based on regions. If there was a business requirement in Oracle to query the contents of two partitioned tables, it would be better to keep it as one table now.

| username: zhaokede | Original post link

When migrating partitioned tables from an Oracle database to TiDB, whether to merge 1024 partitioned tables back into a single table depends on multiple factors, including data size, query patterns, write frequency, and TiDB’s performance characteristics. Here is an analysis of the potential benefits and possible performance issues of merging partitioned tables into a single table:

Benefits of Merging into a Single Table:

  1. Simplified Program Logic: Reducing the number of tables simplifies the program logic, eliminating the need to handle complex partitioning logic, such as routing data to different tables based on certain conditions.
  2. Easier Management: Maintaining one table is easier than maintaining multiple tables, making operations like backup, recovery, and monitoring more straightforward.

Possible Performance Issues:

  1. Write Performance: Although TiDB is a distributed database capable of handling large-scale data and high-concurrency writes, writing all data into a single table may create write hotspots, leading to decreased write performance. This needs to be evaluated based on the actual write patterns and TiDB cluster configuration.
  2. Query Performance: For complex queries, especially those involving large amounts of data, partitioned tables can improve query performance by reducing the data scan range. After merging all data into a single table, if the queries are not optimized for TiDB’s distributed characteristics, query performance may decline.
  3. Data Distribution and Load Balancing: TiDB achieves data distribution and load balancing through its distributed architecture. With partitioned tables, data distribution and load balancing can be optimized by adjusting the partitioning strategy. After merging all data into a single table, it is essential to ensure that the TiDB cluster configuration can handle data distribution and load balancing effectively to avoid hotspots on certain nodes.

Suggested Solutions:

  1. Evaluate Data Size and Query Patterns: Before deciding whether to merge partitioned tables, evaluate factors such as data size, query patterns, and write frequency. If the data size is small, query patterns are simple, and write frequency is low, merging partitioned tables might be a feasible solution.
  2. Performance Testing: Before merging partitioned tables, conduct performance tests, including write performance tests and query performance tests. Testing can help evaluate the impact of merging partitioned tables on performance and make necessary optimizations and adjustments based on the test results.
  3. Consider TiDB’s Distributed Characteristics: When merging partitioned tables, fully utilize TiDB’s distributed characteristics to optimize data distribution and load balancing. For example, adjust TiDB’s partition key strategy to optimize data distribution or use TiDB’s load balancing features to ensure even data distribution across nodes.

In summary, when deciding whether to merge partitioned tables from an Oracle database into a single table in TiDB, consider multiple factors, including data size, query patterns, write frequency, and TiDB’s performance characteristics. By evaluating these factors and conducting performance tests, you can choose the most suitable solution.

| username: 霸王龙的日常 | Original post link

The reason for using TiDB is to avoid sharding, so I suggest merging. For production, without actual comparison data, the leadership won’t be convinced. It’s best to test it practically.

| username: Kongdom | Original post link

As long as the hardware is in place, performance generally won’t be a big issue. Of course, this is not an excuse for developers to write poor SQL. :yum:

| username: DBAER | Original post link

Specific test scenarios are needed, and the level of cooperation from the business is also required, as code modifications are necessary after all.

| username: 小龙虾爱大龙虾 | Original post link

For transformation testing in typical scenarios, this is how we do it. Otherwise, who can help you evaluate? The same data volume, different scenarios, and different development levels will yield different results. If you are only doing point queries, then you don’t need to split the tables.

| username: tidb狂热爱好者 | Original post link

Suggest merging tables.

| username: chenhanneu | Original post link

Some of the migrations were merged, while others were not. We don’t want to make changes, so we just need to move the partitioned tables directly to meet the requirements. Some merges resulted in performance degradation, so we had to partition them again.

However, for the sake of overall simplicity, if the performance meets the requirements, the developers are willing to make changes, and we will merge them if possible. Partitioned tables have a higher probability of bugs compared to regular tables, so the simpler, the more stable.