I would like to ask everyone, for a user-based aggregate query, such as sum, is Tiflash needed to accelerate it?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 想请教大家,一个基于用户的聚合查询,比如sum ,需要Tiflash来加速吗?

| username: breakyang

The approximate statement is
select user, sum(user_amount) from user_bank where userId = ?

| username: Kongdom | Original post link

Acceleration is definitely possible. If TiKV can meet the requirements, use TiKV. If TiKV cannot meet the requirements, use TiFlash for acceleration. Whether acceleration is needed depends on the business.

| username: 我是咖啡哥 | Original post link

It depends on how large your table is and how many rows meet the conditions. Your userId should be able to filter a lot of data. Generally, having a well-built index should suffice. If the concurrency is very high and the table is particularly large, using TiFlash would be better.

| username: redgame | Original post link

One thing to note: TiFlash is suitable for analytical queries and report queries, but it is not suitable for transactional queries (such as updates, inserts, and deletes).

| username: ShawnYan | Original post link

Of course, it can be used, especially if you have the resources to use TiFlash.

| username: tidb菜鸟一只 | Original post link

If a user has a large number of rows, it is suitable. If not, using TiFlash solely for this SQL is unnecessary.

| username: zhouzeru | Original post link

If the data volume is very small, the advantages of TiFlash may not be very noticeable because TiFlash is designed for large-scale data storage and querying.

| username: zhanggame1 | Original post link

It may not necessarily be useful. If the userId column has an index and the proportion of scanned data is not large, using TiKV might be faster.

| username: 有猫万事足 | Original post link

If you have resources, go for it.

select user, sum(user_amount) from user_bank where userId = ?

The execution plan with index is shown above, taking 0.03s.

select user, sum(user_amount) from user_bank group by user

For the same table, directly group by without where
The TiFlash MPP execution plan is shown above, aggregating 30 million rows into 30 thousand rows, and it takes about 5.5 seconds to get the result.

If you have resources, go for it. It’s definitely worth it.
If you don’t have resources, it’s fine; using an index to query one is also manageable.

| username: Jellybean | Original post link

It depends on the data filtering situation of your condition.

If your WHERE condition can filter out the vast majority of data, leaving only a few rows that need to be summed, it indicates a heavy reliance on filtering based on userId. In this case, using the index on TiKV’s row storage can solve your problem and be more efficient. There’s no need to use TiFlash, as it might actually slow things down.

However, if your WHERE condition does not filter much and scans a large number of rows in the table (e.g., tens of thousands or millions of rows), using TiFlash’s columnar storage for statistics can greatly improve efficiency, and the effect will be quite noticeable.

| username: cassblanca | Original post link

TiFlash is columnar storage, which is friendly for statistical analysis operations. Similar to ClickHouse, it offers outstanding performance for single-table queries and aggregations of any scale, typically responding within seconds. TiFlash is similar but depends on the specific business scenario. If there are many statistical analysis AP operations, you can try TiFlash MPP.

| username: forever | Original post link

If there aren’t many analytical SQL queries, adding an index for this scenario can solve the problem and relatively save resources.

| username: cy6301567 | Original post link

If the data volume is large and aggregate queries are frequent, it is recommended to use TiFlash for acceleration.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.