Comparison of Query Times Between TiDB JSON Queries and Regular LIKE Queries Being Too Slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB json查询与常规like查询对比时间过慢

| username: TiDB小萌新

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4
For the same table with 18 million records, executing a count(*) operation with a LIKE fuzzy query takes only 0.2 seconds, while querying JSON data takes 10 seconds. According to the execution plan, using LIKE is slightly more efficient than using JSON queries. Could any experts advise on how to optimize this?

| username: tidb菜鸟一只 | Original post link

The JSON function is the slowest function in the database. Using it means that the JSON function has to be executed for each row of data, which is definitely slower than directly querying the entire table…

| username: MrSylar | Original post link

The functions json_contains and json_array are not supported for pushdown by TiFlash, which results in too much unnecessary data being returned to the TiDB server for further processing.

| username: MrSylar | Original post link

I think a feasible approach is to add new where conditions to reduce the intermediate result set.

| username: TiDB小萌新 | Original post link

I converted the JSON to VARCHAR type. Would using LIKE be better? The content for LIKE is unique.

| username: TiDB小萌新 | Original post link

What if we directly convert the field to varchar type?

| username: 有猫万事足 | Original post link

The TiDB version is too low; there’s no good solution in 5.4. If it’s 7.1, for commonly used JSON query nodes, you can consider using generated columns.

If there are multiple values within a node, you can also create a multi-valued index.

If you can upgrade, consider preparing for an upgrade.

| username: Jellybean | Original post link

The JSON generated column index is a very good feature. We have used this feature in our production environment, and it has significantly improved the efficiency of JSON query filtering. The business side is very satisfied with the response speed, and I highly recommend using it.

| username: redgame | Original post link

Is the index appropriate?

| username: zhanggame1 | Original post link

LIKE is definitely faster. It scans the string and returns success if it contains the specified pattern, whereas the function also needs to parse the JSON file.

| username: tidb菜鸟一只 | Original post link

Using LIKE would be better for two reasons: first, it doesn’t require JSON functions to parse fields, and second, LIKE can be pushed down. However, I’ve never understood why JSON needs to be stored in the database. Why can’t the application layer parse the JSON and store the key fields directly in the table? If you really need to store a lot of JSON fields, wouldn’t it be better to use MongoDB? The so-called generated columns are essentially parsing the JSON, storing the key fields in the table, and then creating an index. Why should this work be done in the database? Wouldn’t it be better to do it at the application layer?

| username: TiDBer_iCdTOZ1r | Original post link

It can be converted to varchar.

| username: cy6301567 | Original post link

It is recommended not to use JSON. If you have to use fuzzy search, try to use a single LIKE to go through the index. If it’s a large amount of data, it’s better to use Elasticsearch for fuzzy search.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.