Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: TiSpark 使用指南&资料大全🔥
What is TiSpark?
TiSpark is a product launched by PingCAP to address users’ complex OLAP needs. It leverages the Spark platform and integrates the advantages of the TiKV distributed cluster, working together with TiDB to provide users with a one-stop solution for HTAP (Hybrid Transactional/Analytical Processing) needs.
TiSpark deeply integrates with the Spark Catalyst engine, allowing precise control over computations, enabling Spark to efficiently read data from TiKV. TiSpark also provides index support to help achieve high-speed point queries.
By pushing computations down to TiKV, TiSpark enhances data query efficiency, reduces the amount of data Spark SQL needs to process, and selects the optimal query plan using TiDB’s built-in statistics.
TiSpark and TiDB allow users to perform both transactional and analytical tasks on the same platform without the need to create and maintain ETL processes. This simplifies system architecture and reduces operational costs.
Users can use various tools from the Spark ecosystem for data processing on TiDB, such as:
- TiSpark: Data analysis and ETL
- TiKV: Data retrieval
- Scheduling system: Report generation
Additionally, TiSpark provides distributed writing capabilities to TiKV. Compared to using Spark combined with JDBC to write to TiDB, distributed writing to TiKV can achieve transactions (either all data is written successfully, or all data fails to write).
TiSpark Deployment and Usage
Collection of Articles on TiSpark Development Practices and New Features
Collection of Articles on TiSpark Source Code Interpretation
Mass Data Batch Processing Technology Based on TiSpark | TiDB Tools
Popular Q&A on TiSpark
If you have any questions related to TiSpark, feel free to ask on Asktug. Click to view 问题搜索指南&提问准则 - TiDB 的问答社区!