Performance Benchmark Comparison of TiDB/PolarDB/TDSQL-C/GaiaDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB/PolarDB/TDSQL-C/GaiaDB性能压测对比

| username: 我是吉米哥

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]

I saw a database test report online, and the conclusion is that TiDB is weaker compared to the other three.
Overall performance ranking: Alibaba Cloud > Baidu Intelligent Cloud > Tencent Cloud > TiDB

Logically, TiDB should perform better in batch inserts (LSM-Tree sequential IO) and point query scenarios (PointGet).

I am a TiDB newbie, could experienced users please explain this? Thank you.

Original article link:

| username: dba-kit | Original post link

With this amount of data, a standalone machine will definitely run faster. The latency is very small for this amount of data, and for a distributed database like TiDB that separates storage and computation, the network latency will take up a significant portion.

| username: ShawnYan | Original post link

Practice leads to true knowledge. I suggest trying TiDB yourself first to get more familiar with it, and then look at the reviews.

| username: dba-kit | Original post link

TiDB is suitable for high-concurrency OLTP scenarios. It becomes a more appropriate choice when your business encounters single-machine bottlenecks. For small data volume tests, MySQL or so-called cloud-native databases based on MySQL will definitely perform better than TiDB.

| username: dba-kit | Original post link

Additionally, the characteristic of LSM-Tree is fast writes and slow reads. Coupled with TiDB’s feature where the tidb-server does not cache data, PointGet must also fetch data from TiKV, so PointGet is actually not advantageous. Previously, when using sysbench for stress testing, TiDB indeed performed better than MySQL in other write scenarios besides PointGet (although the total machine resources required are higher than a single MySQL instance, considering that current MySQL setups typically involve read-write separation with multiple replicas).

| username: 有猫万事足 | Original post link

Choose what suits you. There is no silver bullet in the software world.

If MySQL can solve the problem, there’s no need to use TiDB.

Also, yesterday it was DM and Otter, and today it’s TiDB and other databases? Are you on a mission?
If it’s appropriate, add me. I want to earn this money too.

| username: zhaokede | Original post link

Only looking at one’s own user experience. Comparisons without considering the business context are meaningless.

| username: TiDBer_QKDdYGfz | Original post link

Distributed systems do not have an advantage when the data volume is small, right?

| username: 裤衩儿飞上天 | Original post link

Simulate real business scenarios and data volumes for testing.
Testing without considering business scenarios and data volumes is meaningless.

| username: Jellybean | Original post link

–table_size=25000 --tables=250

The application scenarios are different, and the data volume in the test is too small. It is recommended to set the table to tens of millions, hundreds of millions, or even billions of rows for testing.

TiDB is a distributed database that can only demonstrate its value under the premise of massive storage.

| username: Billmay表妹 | Original post link

There are many methods and types of data for testing databases. Selecting some segments for product analysis may have its limitations. If you want to see rankings, you can refer to the more authoritative and fair DB-Engines Ranking - popularity ranking of database management systems, which considers the technical capabilities of the product, community, public acceptance, and more.

| username: Billmay表妹 | Original post link

It is very common in China that when testing a product, some points are always found to be better than others. We can either conduct the tests ourselves or choose some influential and fair platforms in the industry to look at some data comparisons.

| username: 这里介绍不了我 | Original post link

In your business scenario, test it yourself to see which one better meets your expectations; otherwise, it doesn’t make much sense.

| username: 小龙虾爱大龙虾 | Original post link

It depends on the business scenario; each architecture has its suitable context:
For example, if the scenario is simple and the sharding key is clear, sharding and partitioning might handle a lot of concurrency.
Another example is if my business doesn’t have much concurrency, a single machine works just fine.

| username: 濱崎悟空 | Original post link

Compare and see based on the scenario.