Seeking Best Practices for Handling Semi-Structured Data with TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 寻求TiDB处理半结构化数据的最佳实践

| username: TiDBer_oHpB1az2

Conducting a technical survey for a project that the company is about to launch. It is expected to have half structured data and half semi-structured data. The system will be deployed in an internal network data center.

If it were deployed on a public cloud, it would probably be MySQL+MongoDB, as there is almost no need for maintenance.

If it is in an internal network data center, a distributed database is necessary. It is also estimated that it will be split into several subsystems, each with its own independent data cluster.

Using a TiDB cluster + MongoDB cluster feels too complex. I hope to handle everything with one setup.

There are probably not many transactional operations, and storing structured data in MongoDB is not a big issue. But it feels like the MongoDB community is almost non-existent.

If using TiDB, I am very concerned about its support for semi-structured data. I checked the documentation, and TiDB has some simple support for JSON, which seems to be still in the experimental stage: JSON 类型 | PingCAP 文档中心

So I am seeking some practical experience with TiDB handling semi-structured data and technical selection advice.

| username: pingyu | Original post link

TiKV can be directly used as a NoSQL database.
Reference: TiKV | TiKV API v2

Data in TiKV is separated by modes, supports using TiDB, TxnKV, and RawKV in a single cluster at the same time.

It can meet the requirement of handling everything with one setup.

| username: xfworld | Original post link

Here are a few reference points that I have listed:

  1. The size of a single Json column (maximum 6 MB)
  2. The scope of transaction operations needs to be determined (if the column is too large, more memory is required to maintain the two-phase commit of the transaction)
  3. Json query capabilities and indexing capabilities are still in the experimental stage (deep participation is needed, and better optimization effects may be achieved later)
  4. Can accept simple CURD operations, relying on the primary key to complete operations (operation speed is guaranteed)

If the above listed information is acceptable, you can use it boldly.

Below are the reference limits for rows, columns, and types:

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.