Fuzzy query using FIND_IN_SET requires a FULLTEXT index, but TiDB does not support this index

translator_bot · June 22, 2024, 5:49am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 模糊查询使用FIND_IN_SET需要建立FULLTEXT索引，但是tidb不支持这个索引

| username: 今天不想写代码

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
Fuzzy search using FIND_IN_SET requires creating a FULLTEXT index, but TiDB does not support this index. The table has only 90,000 rows of data, but when the offset for the fuzzy search limit is large, it starts reporting slow queries (above 0.3s).
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]
SELECT
…
FROM
… ca
WHERE
ca.is_del = 0
AND ca.user_key LIKE CONCAT(‘%’, ‘abc’, ‘%’)
ORDER BY
ca.course_date DESC
LIMIT
35, 5;

translator_bot · June 22, 2024, 5:49am

| username: zhanggame1 | Original post link

Without considering LIMIT, how many entries meet the criteria? If there are too many, taking more than 0.3 seconds is normal.

translator_bot · June 22, 2024, 5:49am

| username: 今天不想写代码 | Original post link

Checking the number of entries for a user is only 130. I just tested it, and without limit, it only takes 0.1 seconds. With limit 35,5, it takes 0.3-0.4 seconds. I feel that limit is also quite time-consuming.

translator_bot · June 22, 2024, 5:49am

| username: tidb狂热爱好者 | Original post link

You can try TiFlash.

translator_bot · June 22, 2024, 5:49am

| username: zhanggame1 | Original post link

Could you please share the execution plan to see if any experts here can optimize it?

translator_bot · June 22, 2024, 5:49am

| username: 今天不想写代码 | Original post link

I set up TiFlash, but it doesn’t seem to be used.

translator_bot · June 22, 2024, 5:49am

| username: 今天不想写代码 | Original post link

Supplementary Execution Process

id	task	estRows	operator info	actRows	execution info	memory	disk
Projection_9	root	5	xg_cloud.course_attendance.id, xg_cloud.course_attendance.start_time, xg_cloud.course_attendance.end_time, xg_cloud.course_attendance.class_room, xg_cloud.course_attendance.class_name, xg_cloud.course_attendance.jc, xg_cloud.course_attendance.zc, xg_cloud.course_attendance.course_name, date_format(cast(xg_cloud.course_attendance.course_date, datetime(6) BINARY), %m月%d日)->Column#31	5	time:411ms, loops:2, Concurrency:OFF	2.71 KB	N/A
└─Limit_14	root	5	offset:35, count:5	5	time:411ms, loops:2	N/A	N/A
└─IndexLookUp_22	root	40		40	time:411ms, loops:1, index_task: {total_time: 107.9ms, fetch_handle: 3.89ms, build: 10.4ms, wait: 93.6ms}, table_task: {total_time: 1.56s, num: 13, concurrency: 5}	1.75 MB	N/A
├─IndexFullScan_19(Build)	cop[tikv]	50.03	table:ca, index:index_course_date(course_date), keep order:true, desc	87220	time:937.2µs, loops:92, cop_task: {num: 1, max: 591.5µs, proc_keys: 0, rpc_num: 1, rpc_time: 578.7µs, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15}, tikv_task:{time:43ms, loops:90}	N/A	N/A
└─Selection_21(Probe)	cop[tikv]	40	eq(xg_cloud.course_attendance.is_del, 0), eq(xg_cloud.course_attendance.is_submit, 1), like(xg_cloud.course_attendance.teacher_key, %20101065%, 92)	82	time:1.52s, loops:24, cop_task: {num: 15, max: 370.8ms, min: 668.2µs, avg: 94ms, p95: 370.8ms, max_proc_keys: 20480, p95_proc_keys: 20480, tot_proc: 1.35s, tot_wait: 2ms, rpc_num: 29, rpc_time: 2.82s, copr_cache_hit_ratio: 0.07, distsql_concurrency: 15}, tikv_task:{proc max:346ms, min:0s, avg: 89.1ms, p80:187ms, p95:346ms, iters:125, tasks:15}, scan_detail: {total_process_keys: 66486, total_process_keys_size: 21619275, total_keys: 66503, rocksdb: {key_skipped_count: 19, block: {cache_hit_count: 523836}}}	N/A	N/A
└─TableRowIDScan_20	cop[tikv]	50.03	table:ca, keep order:false	66742	tikv_task:{proc max:330ms, min:0s, avg: 85.9ms, p80:178ms, p95:330ms, iters:125, tasks:15}	N/A	N/A

translator_bot · June 22, 2024, 5:49am

| username: redgame | Original post link

Have you tried prefix indexing? You can give it a try.

translator_bot · June 22, 2024, 5:49am

| username: 今天不想写代码 | Original post link

I haven’t tried it, could you give me some guidance?

translator_bot · June 22, 2024, 5:49am

| username: zhanggame1 | Original post link

There are these two filters in the execution plan: eq(xg_cloud.course_attendance.is_del, 0) and eq(xg_cloud.course_attendance.is_submit, 1). How much percentage can be filtered out approximately? If the number of filtered rows is small, try adding a combined index on is_del and is_submit, and then check again.

translator_bot · June 22, 2024, 5:49am

| username: Anna | Original post link

You should try TiFlash.