TiDB Fuzzy Query Speed is Too Slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB模糊查询速度过慢

| username: TiDB小萌新

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.4.0
[TiDB Configuration] 8 database nodes, 8 cores, 16GB cluster
TiDB fuzzy query is too slow with a data volume of 100 million. Dear experts, is there any room for optimization for this type of query?

| username: 有猫万事足 | Original post link

You need an execution plan, brother. :joy:
Otherwise, just looking at the results isn’t enough; you won’t know where the issue lies.

| username: TiDB小萌新 | Original post link

This one is forced to use TiFlash.


This one is not forced to use TiFlash.

| username: tidb菜鸟一只 | Original post link

I think I saw this before. My suggestion is, if you want to speed up, you should create a composite index on all the fields you query, filter, and sort on… This way, you don’t need to go back to the table, and it will be faster.

| username: TiDB小萌新 | Original post link

Last time, I followed your suggestion and added a composite index, and now the list is very fast without query conditions. However, my SQL may have many LIKE query conditions, and LIKE does not use indexes, right?

| username: 有猫万事足 | Original post link

You need to continue optimizing the TiFlash query. If you have more than one TiFlash, you can try the MPP mode. I saw ExchangeSender in your execution plan, so you are already using MPP mode.

Actually, I feel that TiKV might be more suitable for your query, especially since the id_card field should be particularly suitable for indexing. But the query didn’t use it, possibly because there is no index, or because of the %idcard%, where the % at the beginning causes the issue. If possible, adding an index on idcard and using idcard% as a condition should be able to utilize the index. Other conditions that can quickly narrow down to a small range of values can also be indexed to see if it helps.

| username: TiDB小萌新 | Original post link

CREATE TABLE license (
id varchar(20) NOT NULL,
create_time bigint(20) DEFAULT NULL,
update_time bigint(20) DEFAULT NULL,
is_deleted int(11) DEFAULT ‘0’,
user_id varchar(100) DEFAULT NULL COMMENT ‘User ID’,
zlb_user_id varchar(100) DEFAULT NULL COMMENT ‘Zhejiang Government Service ID’,
ins_id varchar(20) DEFAULT NULL,
org_id varchar(20) DEFAULT NULL COMMENT ‘Achievement Directory ID’,
license_type varchar(255) DEFAULT NULL COMMENT ‘Achievement Type’,
license_data json DEFAULT NULL COMMENT ‘Achievement Field Details’,
org_license_type varchar(30) DEFAULT NULL COMMENT ‘Achievement org_license Subdivision’,
course_id varchar(20) DEFAULT NULL COMMENT ‘Course ID’,
course_name varchar(100) DEFAULT NULL COMMENT ‘Course Name’,
tran_time bigint(0) DEFAULT ‘0’ COMMENT ‘Conversion Time’,
apply_file json DEFAULT NULL COMMENT ‘Application File’,
mec_id varchar(50) DEFAULT NULL COMMENT ‘Institution ID’,
status varchar(20) DEFAULT NULL COMMENT ‘Review Status’,
tran_status varchar(30) DEFAULT NULL COMMENT ‘Conversion Status’,
first_mec_id varchar(30) DEFAULT NULL COMMENT ‘Primary Institution ID’,
second_mec_id varchar(30) DEFAULT NULL COMMENT ‘Secondary Institution ID’,
license_code varchar(50) DEFAULT NULL COMMENT ‘Achievement Code’,
hash_value varchar(256) DEFAULT NULL COMMENT ‘Blockchain Hash’,
source varchar(30) DEFAULT NULL,
id_card varchar(50) DEFAULT NULL COMMENT ‘ID Card’,
credit_hour varchar(30) DEFAULT NULL,
user_name varchar(50) DEFAULT NULL,
user_code varchar(30) DEFAULT NULL,
org_code varchar(50) DEFAULT NULL,
license_name varchar(100) DEFAULT NULL,
career_qualification_data json DEFAULT NULL COMMENT ‘Certificate, Course, Major Information’,
operation_phone varchar(50) DEFAULT NULL COMMENT ‘Operator Phone Number’,
operation_name varchar(50) DEFAULT NULL COMMENT ‘Operator Name’,
mec_name varchar(50) DEFAULT NULL COMMENT ‘Institution Name’,
is_old tinyint(3) DEFAULT ‘0’ COMMENT ‘Is Old System’,
old_id varchar(50) DEFAULT NULL COMMENT ‘Old System ID’,
KEY zlbuserid_idx (zlb_user_id),
PRIMARY KEY (id) /*T![clustered_index] NONCLUSTERED */,
KEY id_card_idx (id_card),
KEY create_time_idx (create_time),
KEY idx_delete_type_mec_create (is_deleted,org_license_type,mec_id,create_time),
KEY idx_delete_type_first_mec_create (is_deleted,org_license_type,first_mec_id,create_time),
KEY idx_deleted_orgId (is_deleted,org_id),
KEY idx_delete_type_second_mec_create (is_deleted,org_license_type,second_mec_id,create_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

| username: TiDB小萌新 | Original post link

I created an index for this table, but because of %idCard%, it doesn’t use the index, resulting in a large amount of data retrieval. That’s why I used a hint to use TiFlash, but even then, pagination queries take more than ten seconds. I have two TiFlash nodes.


MMP has already been enabled.

| username: redgame | Original post link

Create a prefix index on the field being queried with a fuzzy search.

| username: TiDB小萌新 | Original post link

Is a prefix index useful for %idCard% fuzzy queries?

| username: tidb菜鸟一只 | Original post link

The like '%...%' query indeed cannot use indexes, while like '...%' can. TiDB does not support full-text indexing. Are you sure the id_card field requires the like '%...%' query condition?

| username: TiDB小萌新 | Original post link

Yes, it is required for business purposes, and there are other similar queries like user_name that also need ‘%…%’ type searches.

| username: zhanggame1 | Original post link

%…% this kind of like cannot be optimized. You can look into expression indexes to see if it’s possible to convert the query condition into an expression and then create an index.
CREATE INDEX | PingCAP Documentation Center

| username: 啦啦啦啦啦 | Original post link

We use Elasticsearch for those like %% queries.

| username: tidb菜鸟一只 | Original post link

If it really doesn’t work, try syncing the data to Elasticsearch… TiDB itself doesn’t have much room for optimization for this kind of SQL.

| username: TiDB小萌新 | Original post link

Alright, I’ll communicate with our business department. If it doesn’t work, we’ll have to switch to ES.

| username: Kongdom | Original post link

It seems like you don’t need to use “like” for this query.

| username: cassblanca | Original post link

Using Like in TiDB alone can lead to ambiguous results or failure to return results that meet the search criteria. From the perspective of TiDB indexing, it cannot be further optimized. A better approach is to use other methods, such as full-text search with Elasticsearch (ES).