Why is writing slower than MySQL?

translator_bot · June 21, 2024, 11:28am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 为什么写入比mysql慢呢

| username: TiDBer_JlY1JCJ5

Previously, querying wide tables with MySQL at the tens of millions level was found to be very slow. Therefore, following the operation documentation and the quick start guide, I simulated the deployment of a production environment cluster on a single machine. After installation, I switched to TiDB and found that the query speed improved significantly. However, when measuring TPS for write speed, I found that it was over 400 with MySQL, but only over 200 with TiDB. Could you please explain why the write speed is slower with TiDB and suggest ways to improve it?

translator_bot · June 21, 2024, 11:28am

| username: zhanggame1 | Original post link

Simulating the deployment of a production environment cluster on a single machine will definitely be slower than MySQL. It needs to be deployed according to the official distributed deployment to speed up. Additionally, simulating the deployment of a production environment cluster on a single machine with 3 TiKVs means that each piece of data is written three times, making it even slower. Try deploying with a single TiKV to see if it speeds up.

translator_bot · June 21, 2024, 11:28am

| username: TiDBer_JlY1JCJ5 | Original post link

I see that the minimum requirement is 3 TiKV nodes. Will there be issues if I only deploy one?

translator_bot · June 21, 2024, 11:28am

| username: zhanggame1 | Original post link

Three TiKV instances refer to deploying them on three servers. Deploying them on a single machine with a single hard drive is meaningless and can only be used to test functionality.

translator_bot · June 21, 2024, 11:28am

| username: TiDBer_JlY1JCJ5 | Original post link

Is it necessary to redeploy? Can I shut down the other two now? Is there such a method?

translator_bot · June 21, 2024, 11:28am

| username: zhanggame1 | Original post link

It probably needs to be redeployed, but testing it this way doesn’t make much sense. TiDB’s advantage is with single tables having over 50 million rows; it performs better than MySQL in a distributed cluster. If the data volume is not large or if it’s running on a single machine, it doesn’t make much sense.

translator_bot · June 21, 2024, 11:28am

| username: TIDB-Learner | Original post link

Firstly, deploying a cluster on a single server doesn’t have much effect. The server’s performance cannot meet the requirements. It is recommended to follow the official configuration. 2. Testing a cluster with tens of millions of data on a single server shows significant improvement in query (read) performance, but the write performance is not as good as MySQL. TiDB and TiKV are separated for computation and storage, and PD is the brain. You cannot ignore the network and transmission query latency between them, which adds some GRPC overhead. 3. Optimize indexes, use prepared statements, and if the write volume is large, consider increasing TiDB cache-related configurations and batch processing. 4. If the database volume is not large, the advantages of using a distributed database are not obvious.

translator_bot · June 21, 2024, 11:28am

| username: 像风一样的男子 | Original post link

Comparing the performance of standalone MySQL and TiDB is meaningless; standalone MySQL outperforms TiDB. TiDB addresses scalability issues that MySQL does not.

translator_bot · June 21, 2024, 11:28am

| username: 小龙虾爱大龙虾 | Original post link

Comparing single write response times is meaningless. TiDB suffers from severe write amplification (multiple replicas + RocksDB), along with a distributed storage and compute separation architecture. However, TiDB can handle greater concurrency and more data through horizontal scaling, and it also offers high availability.

translator_bot · June 21, 2024, 11:28am

| username: 路在何chu | Original post link

Your data volume is too small. For testing with tens of millions or hundreds of millions of data, MySQL outperforms TiDB.

translator_bot · June 21, 2024, 11:28am

| username: TiDBer_JlY1JCJ5 | Original post link

For data tests involving tens of millions to hundreds of millions of entries, the read performance is definitely better than MySQL. Will the write performance be superior to MySQL?

translator_bot · June 21, 2024, 11:28am

| username: TiDBer_JlY1JCJ5 | Original post link

What does “compute” refer to here? Why does separating storage and compute affect write performance?

translator_bot · June 21, 2024, 11:28am

| username: 路在何chu | Original post link

Also because of MySQL,

translator_bot · June 21, 2024, 11:29am

| username: TiDBer_JlY1JCJ5 | Original post link

Where should cache-related configurations generally be changed? Should they be modified in the configuration file or through SQL statements in tools like Navicat?

translator_bot · June 21, 2024, 11:29am

| username: 小龙虾爱大龙虾 | Original post link

Because it still needs to send data from the TiDB node to the TiKV node, right?

translator_bot · June 21, 2024, 11:29am

| username: TiDBer_JlY1JCJ5 | Original post link

Comparing single write response time is meaningless. Generally, what metrics are meaningful to compare?

translator_bot · June 21, 2024, 11:29am

| username: TIDB-Learner | Original post link

Execute the statement in the MySQL client for a hot update. Please refer to the official documentation for details. Understand the principles first, and then decide whether to make modifications based on your specific situation.

translator_bot · June 21, 2024, 11:29am

| username: forever | Original post link

You will only see an improvement in insertion performance if you place the three TiKV instances on three separate machines.

translator_bot · June 21, 2024, 11:29am

| username: 小龙虾爱大龙虾 | Original post link

The response time of a single query is definitely meaningful for business, but when comparing distributed databases with centralized databases, centralized databases are certainly faster in response time. You can compare other attributes, such as latency under high concurrent writes and overall throughput.

translator_bot · June 21, 2024, 11:29am

| username: dba远航 | Original post link

To test and compare, it is necessary to fully simulate the production environment; otherwise, the data will be significantly different.