Fuzzy query using FIND_IN_SET requires a FULLTEXT index, but TiDB does not support this index

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 模糊查询使用FIND_IN_SET需要建立FULLTEXT索引,但是tidb不支持这个索引

| username: 今天不想写代码

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
Fuzzy search using FIND_IN_SET requires creating a FULLTEXT index, but TiDB does not support this index. The table has only 90,000 rows of data, but when the offset for the fuzzy search limit is large, it starts reporting slow queries (above 0.3s).
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]
SELECT

FROM
… ca
WHERE
ca.is_del = 0
AND ca.user_key LIKE CONCAT(‘%’, ‘abc’, ‘%’)
ORDER BY
ca.course_date DESC
LIMIT
35, 5;

| username: zhanggame1 | Original post link

Without considering LIMIT, how many entries meet the criteria? If there are too many, taking more than 0.3 seconds is normal.

| username: 今天不想写代码 | Original post link

Checking the number of entries for a user is only 130. I just tested it, and without limit, it only takes 0.1 seconds. With limit 35,5, it takes 0.3-0.4 seconds. I feel that limit is also quite time-consuming.

| username: tidb狂热爱好者 | Original post link

You can try TiFlash.

| username: zhanggame1 | Original post link

Could you please share the execution plan to see if any experts here can optimize it?

| username: 今天不想写代码 | Original post link

I set up TiFlash, but it doesn’t seem to be used.

| username: 今天不想写代码 | Original post link

Supplementary Execution Process

id task estRows operator info actRows execution info memory disk
Projection_9 root 5 xg_cloud.course_attendance.id, xg_cloud.course_attendance.start_time, xg_cloud.course_attendance.end_time, xg_cloud.course_attendance.class_room, xg_cloud.course_attendance.class_name, xg_cloud.course_attendance.jc, xg_cloud.course_attendance.zc, xg_cloud.course_attendance.course_name, date_format(cast(xg_cloud.course_attendance.course_date, datetime(6) BINARY), %m月%d日)->Column#31 5 time:411ms, loops:2, Concurrency:OFF 2.71 KB N/A
└─Limit_14 root 5 offset:35, count:5 5 time:411ms, loops:2 N/A N/A
└─IndexLookUp_22 root 40 40 time:411ms, loops:1, index_task: {total_time: 107.9ms, fetch_handle: 3.89ms, build: 10.4ms, wait: 93.6ms}, table_task: {total_time: 1.56s, num: 13, concurrency: 5} 1.75 MB N/A
├─IndexFullScan_19(Build) cop[tikv] 50.03 table:ca, index:index_course_date(course_date), keep order:true, desc 87220 time:937.2µs, loops:92, cop_task: {num: 1, max: 591.5µs, proc_keys: 0, rpc_num: 1, rpc_time: 578.7µs, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15}, tikv_task:{time:43ms, loops:90} N/A N/A
└─Selection_21(Probe) cop[tikv] 40 eq(xg_cloud.course_attendance.is_del, 0), eq(xg_cloud.course_attendance.is_submit, 1), like(xg_cloud.course_attendance.teacher_key, %20101065%, 92) 82 time:1.52s, loops:24, cop_task: {num: 15, max: 370.8ms, min: 668.2µs, avg: 94ms, p95: 370.8ms, max_proc_keys: 20480, p95_proc_keys: 20480, tot_proc: 1.35s, tot_wait: 2ms, rpc_num: 29, rpc_time: 2.82s, copr_cache_hit_ratio: 0.07, distsql_concurrency: 15}, tikv_task:{proc max:346ms, min:0s, avg: 89.1ms, p80:187ms, p95:346ms, iters:125, tasks:15}, scan_detail: {total_process_keys: 66486, total_process_keys_size: 21619275, total_keys: 66503, rocksdb: {key_skipped_count: 19, block: {cache_hit_count: 523836}}} N/A N/A
└─TableRowIDScan_20 cop[tikv] 50.03 table:ca, keep order:false 66742 tikv_task:{proc max:330ms, min:0s, avg: 85.9ms, p80:178ms, p95:330ms, iters:125, tasks:15} N/A N/A
| username: redgame | Original post link

Have you tried prefix indexing? You can give it a try.

| username: 今天不想写代码 | Original post link

I haven’t tried it, could you give me some guidance?

| username: zhanggame1 | Original post link

There are these two filters in the execution plan: eq(xg_cloud.course_attendance.is_del, 0) and eq(xg_cloud.course_attendance.is_submit, 1). How much percentage can be filtered out approximately? If the number of filtered rows is small, try adding a combined index on is_del and is_submit, and then check again.

| username: Anna | Original post link

You should try TiFlash.