Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 为什么写入比mysql慢呢
Previously, querying wide tables with MySQL at the tens of millions level was found to be very slow. Therefore, following the operation documentation and the quick start guide, I simulated the deployment of a production environment cluster on a single machine. After installation, I switched to TiDB and found that the query speed improved significantly. However, when measuring TPS for write speed, I found that it was over 400 with MySQL, but only over 200 with TiDB. Could you please explain why the write speed is slower with TiDB and suggest ways to improve it?
Simulating the deployment of a production environment cluster on a single machine will definitely be slower than MySQL. It needs to be deployed according to the official distributed deployment to speed up. Additionally, simulating the deployment of a production environment cluster on a single machine with 3 TiKVs means that each piece of data is written three times, making it even slower. Try deploying with a single TiKV to see if it speeds up.
I see that the minimum requirement is 3 TiKV nodes. Will there be issues if I only deploy one?
Three TiKV instances refer to deploying them on three servers. Deploying them on a single machine with a single hard drive is meaningless and can only be used to test functionality.
Is it necessary to redeploy? Can I shut down the other two now? Is there such a method?
It probably needs to be redeployed, but testing it this way doesn’t make much sense. TiDB’s advantage is with single tables having over 50 million rows; it performs better than MySQL in a distributed cluster. If the data volume is not large or if it’s running on a single machine, it doesn’t make much sense.
Comparing the performance of standalone MySQL and TiDB is meaningless; standalone MySQL outperforms TiDB. TiDB addresses scalability issues that MySQL does not.
Comparing single write response times is meaningless. TiDB suffers from severe write amplification (multiple replicas + RocksDB), along with a distributed storage and compute separation architecture. However, TiDB can handle greater concurrency and more data through horizontal scaling, and it also offers high availability.
Your data volume is too small. For testing with tens of millions or hundreds of millions of data, MySQL outperforms TiDB.
For data tests involving tens of millions to hundreds of millions of entries, the read performance is definitely better than MySQL. Will the write performance be superior to MySQL?
What does “compute” refer to here? Why does separating storage and compute affect write performance?
Where should cache-related configurations generally be changed? Should they be modified in the configuration file or through SQL statements in tools like Navicat?
Because it still needs to send data from the TiDB node to the TiKV node, right?
Comparing single write response time is meaningless. Generally, what metrics are meaningful to compare?
Execute the statement in the MySQL client for a hot update. Please refer to the official documentation for details. Understand the principles first, and then decide whether to make modifications based on your specific situation.
You will only see an improvement in insertion performance if you place the three TiKV instances on three separate machines.
The response time of a single query is definitely meaningful for business, but when comparing distributed databases with centralized databases, centralized databases are certainly faster in response time. You can compare other attributes, such as latency under high concurrent writes and overall throughput.
To test and compare, it is necessary to fully simulate the production environment; otherwise, the data will be significantly different.