Just started with TiDB, wondering if it can be an alternative to Impala+Kudu? Does it have an advantage in high concurrency query performance?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 刚刚接触TiDB, 想了解下可否作为 impala+kudu 的替带方案呢?在高并发查询的性能上有没有优势。

| username: TiDBer_RbcaAMmi

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 6.5
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Symptoms and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]

| username: zhaokede | Original post link

TiDB has significant advantages in high concurrency scenarios, mainly due to its distributed architecture and unique design features. These include cloud-native design, distributed architecture with elastic scaling, financial-grade high availability, real-time HTAP capabilities, and optimizations for high concurrency batch write scenarios.

| username: 友利奈绪 | Original post link

  1. Distributed Architecture: TiDB adopts a distributed architecture where data is stored across multiple nodes, allowing for horizontal scaling to meet the demands of large-scale data storage and processing. This architecture enables TiDB to easily handle large-scale data and high-concurrency access.
  2. Horizontal Scalability: TiDB can achieve horizontal scaling simply by adding more nodes without needing to modify application code. This allows TiDB to easily cope with data growth and traffic changes.
  3. Consistency and High Availability: TiDB supports strong consistency and high availability. It uses the Raft protocol to achieve consistent data replication and failover, ensuring data security and reliability.
  4. Distributed Transactions: TiDB supports distributed ACID transactions, ensuring consistency and atomicity of data operations across multiple nodes. This makes TiDB suitable for applications with strict transactional requirements.
  5. SQL Compatibility: TiDB is compatible with MySQL protocols and syntax, allowing it to directly replace MySQL as a data storage engine without modifying existing application code. This makes migrating existing applications to TiDB easier.
  6. Real-time Analytics: TiDB supports both Online Analytical Processing (OLAP) and Online Transaction Processing (OLTP), meeting the needs of real-time analytics. It offers excellent query performance and can handle complex queries and large-scale data analysis.
  7. Automated Operations: TiDB provides tools like TiDB Ansible and TiUP to simplify deployment and operations, offering automated management features that reduce operational complexity and costs.
  8. Ecosystem and Community Support: The TiDB ecosystem is continuously growing, with a rich set of tools and components that can integrate with TiDB to meet various needs. Additionally, TiDB has an active community that provides timely technical support and solutions.
| username: Jellybean | Original post link

TiDB is suitable for massive data storage and high-concurrency OLTP scenarios, as well as large-scale data and high-concurrency scenarios that require real-time processing. It also performs excellently in real-time data analysis scenarios. For more details, you can check the official website introduction:

| username: ziptoam | Original post link

TiDB can serve as an alternative to Impala combined with Kudu, especially when you want a database that can handle both daily transactions and data analysis. Since it is distributed, as the amount of data and the number of users increase, you can improve processing power and storage by adding servers, which is beneficial for high-concurrency queries.

TiDB supports HTAP, meaning data analysis can be done directly on the transactional database without data migration, saving a lot of effort. While Impala paired with Kudu is also very suitable for data analysis, especially within the Hadoop ecosystem, it is not as straightforward for transaction processing and may involve more system management and resource allocation.

As for query performance, TiDB’s distributed nature allows it to perform well when handling a large number of concurrent requests and can scale horizontally as needed. Impala and Kudu can also be very fast in certain scenarios but may require more detailed cluster management and tuning to ensure performance under high concurrency.

TiDB is easier to get started with for many users because it is compatible with MySQL. Impala, on the other hand, is more suitable for environments already using Hadoop. The choice depends on your specific needs, existing technology stack, and team familiarity. In summary, both have their strengths, and the key is to find the one that best fits your current situation.

| username: 濱崎悟空 | Original post link

Combining OLAP and OLTP

| username: Kongdom | Original post link

You can refer to this article