Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 当前业务场景是:用户画像走的是关联表,标签数据和用户数据关联查询。 标签和用户数据的规模为:千万级、亿级 ,目前关联查询时间约 20-30秒 ,之前调研了bitmap 存储方案。这种场景基于tidb有没有更好方案?
【TiDB Usage Environment】Production Environment
【TiDB Version】tidb v6.1.0
【Reproduction Path】Operations performed that led to the issue
【Encountered Issue: Slow Join Query】
【Resource Configuration】5 KV nodes (8 cores, 32GB) 3 TiFlash nodes (16 cores, 64GB)
【Attachments: Screenshots/Logs/Monitoring】
For tens of millions of users and billions of profile tags, with each user having an average of dozens of tags, even if a single user has hundreds of tags, querying for a single user should not take 20-30 seconds. So it’s very strange; this issue is not about which database is used, but rather how the previous setup could be so slow.
We are currently facing similar confusion. However, it is mainly concentrated on tag computation. User profiling is generally fine with point queries, but it is relatively slow when doing mining and recommendations.
We used to use Redis, but it really consumes a lot of machine resources.
We are currently using Alibaba Cloud ADB, which is fine but too expensive. We are now also researching whether TiDB’s TiFlash supports OLAP capabilities for large amounts of data.
TiFlash supports MPP, you can try a POC.
Cache: Use caching mechanisms, such as Redis, to cache the results of popular queries, thereby avoiding frequent database queries.
Use TiFlash: TiFlash is a columnar storage database in the TiDB ecosystem, specifically designed for OLAP queries.
Redis can’t store such a large amount of data, right?
Check out TiDB Serverless architecture in the cloud, with Pay-As-You-Go pricing.
You can search for Ctrip’s practice: