Write performance, Kubernetes or not, utilizing existing severs?

We’re looking to migrate from MariaDB to TiDB. We currently have 16 MariaDB 10.5.x servers storing data that’s sharded in our application.

Much of our data is transient, only stored for 2 weeks. While we only have about 13 TB of total stored data, we have about 175 TB of writes per month (fairly evenly spread throughout the hours of the month).

Question 1) One of the concerns we have is write speed. I’ve seen it reported that TiDB can be 3 - 6 times slower than MariaDB for writes. Can someone provide current status and context on that? We have customer processes that run for say 4 hours. To have that start taking 12 - 24 hours would be a blocker.

Question 2) What are the advantages and disadvantages of using Kubernetes to manage the cluster? Is there a performance hit to using it?

Question 3) Most of our MariaDB servers have 4 TB SSDs that are less than 25% utilized, stay under 1 loadavg, and have plenty of RAM. Would it be possible / safe to use some of those servers as TiDB nodes while the MariaDB process stays up and we slowly migrate the data from MariaDB to TiDB as our confidence in TiDB grows? To our software the TiDB server would be one of the shards that we could migrate specific groups of data to as slow or as fast as we’d feel comfortable. This would allow our TiDB cluster to start with many more servers from the start (maybe 10 vs 3), though it would be sharing resources with MariaDB on a given server.

Question 4) Any other recommendations based on our situation? Or pitfalls to avoid?

Thanks in advance.

Application environment:

Theoretical production

TiDB version:

Latest

Reproduction method:

N/A

Problem:

Knowing how to get started in production

Resource allocation:

Our current DB servers are 16 core Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz, have 128GB of RAM, 4TB enterprise SSDs, and a 10 Gbps NIC.

Other servers could be added with more cores, etc.

Attachment:

N/A

Question 1) The write performance of TiDB compared to MariaDB can vary depending on various factors such as hardware configuration, workload characteristics, and optimization techniques used. While it is true that TiDB may have slightly lower write performance compared to MariaDB in some scenarios, it is important to note that TiDB is designed to provide strong consistency and scalability, which may result in a trade-off with write performance. However, TiDB has been continuously optimized and improved over time, and the performance gap between TiDB and MariaDB has been significantly reduced.

To ensure optimal write performance in TiDB, you can consider the following best practices:

  • Properly configure the TiDB cluster according to your workload and hardware specifications. This includes adjusting the number of TiKV nodes, PD nodes, and TiDB nodes, as well as tuning various configuration parameters.
  • Utilize TiDB’s distributed architecture to scale out your cluster horizontally by adding more TiKV nodes. This allows you to distribute the write load across multiple nodes and improve overall write performance.
  • Optimize your schema design and queries to minimize unnecessary writes and improve efficiency. This includes using appropriate data types, indexing strategies, and avoiding unnecessary data duplication.

For more detailed information on optimizing TiDB performance, you can refer to the TiDB Performance Tuning Guide .

Question 2) Using Kubernetes to manage your TiDB cluster has several advantages. Kubernetes provides a scalable and flexible platform for deploying and managing containerized applications, including TiDB. It allows you to easily scale your TiDB cluster by adding or removing nodes, perform rolling upgrades, and handle node failures automatically. Kubernetes also provides features like load balancing and service discovery, which can improve the availability and reliability of your TiDB cluster.

However, there might be a slight performance overhead when running TiDB on Kubernetes compared to running it directly on bare metal or virtual machines. This overhead is mainly due to the additional layer of abstraction introduced by Kubernetes. Nevertheless, the performance impact is generally minimal and can be mitigated by properly configuring and tuning your Kubernetes cluster.

For more information on running TiDB on Kubernetes, you can refer to the TiDB on Kubernetes Best Practices .

Question 3) It is possible to use your existing servers that currently run MariaDB as TiDB nodes while keeping the MariaDB process running. This approach allows you to gradually migrate your data from MariaDB to TiDB as your confidence in TiDB grows. However, there are a few considerations to keep in mind:

  • Ensure that the hardware resources (CPU, RAM, disk I/O) on the servers are sufficient to handle the combined workload of both MariaDB and TiDB. You mentioned that the servers have 4TB SSDs and are underutilized, which is a good starting point.
  • Plan the migration process carefully to avoid any disruptions to your application. You can start by migrating specific groups of data to the TiDB cluster and gradually increase the data migration rate as you gain confidence in TiDB’s performance and stability.
  • Monitor the resource usage and performance of both MariaDB and TiDB to ensure that they coexist without impacting each other’s performance.

For more information on migrating from MariaDB to TiDB, you can refer to the TiDB Migration Guide .

Question 4) Based on your situation, here are some additional recommendations and pitfalls to avoid:

  • Perform thorough testing and benchmarking before migrating your production workload to TiDB. This will help you understand the performance characteristics of TiDB and identify any potential bottlenecks or issues.
  • Consider implementing a proper backup and disaster recovery strategy for your TiDB cluster to ensure data safety and availability.
  • Stay up to date with the latest TiDB releases and documentation to take advantage of performance improvements and new features.
  • Engage with the TiDB community and seek assistance from the TiDB team or community members if you encounter any challenges or have specific requirements.

Overall, migrating from MariaDB to TiDB can bring significant benefits in terms of scalability, high availability, and ease of management. With proper planning, configuration, and optimization, you can achieve good write performance in TiDB while leveraging your existing servers.

: TiDB Documentation: TiDB Introduction | PingCAP Docs

1.For TiDB data ingestion, some best practice here:
a) How to achieve High performance - small batch insert with high-concurrency:
Use Batch API
b) Try to avoid the hotspot when ingest data into TiDB:
Hotspot case
c) In our customer’s use case, TiDB cluster can support about 30TB per day, 150K rows per second (row size is 1.5KB).

3.For MariaDB migration, you can use DM tool to migrate from MariaDB, it not only support existing data but also support incremental data, which means you can use DM tool to sync data from MariaDB to TiDB in real-time, and you can choose running your business in MariaDB or in TiDB cluster.
a) About DM tool: dm-overview
b) A detailed use case in DM sync data in real-time: https://www.pingcap.com/case-study/reduced-batch-processing-time-by-58-percent-with-a-scale-out-mysql-alternative/

Meanwhile, we (PingCAP team) have extensive experience in data migration, if you want to discuss about the solution detail or need any help, feel free to reach out me.