Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 模糊查询使用FIND_IN_SET需要建立FULLTEXT索引,但是tidb不支持这个索引
[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
Fuzzy search using FIND_IN_SET requires creating a FULLTEXT index, but TiDB does not support this index. The table has only 90,000 rows of data, but when the offset for the fuzzy search limit is large, it starts reporting slow queries (above 0.3s).
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]
SELECT
…
FROM
… ca
WHERE
ca.is_del = 0
AND ca.user_key LIKE CONCAT(‘%’, ‘abc’, ‘%’)
ORDER BY
ca.course_date DESC
LIMIT
35, 5;
Without considering LIMIT, how many entries meet the criteria? If there are too many, taking more than 0.3 seconds is normal.
Checking the number of entries for a user is only 130. I just tested it, and without limit, it only takes 0.1 seconds. With limit 35,5, it takes 0.3-0.4 seconds. I feel that limit is also quite time-consuming.
Could you please share the execution plan to see if any experts here can optimize it?
I set up TiFlash, but it doesn’t seem to be used.
Supplementary Execution Process
|
id |
task |
estRows |
operator info |
actRows |
execution info |
memory |
disk |
|
Projection_9 |
root |
5 |
xg_cloud.course_attendance.id, xg_cloud.course_attendance.start_time, xg_cloud.course_attendance.end_time, xg_cloud.course_attendance.class_room, xg_cloud.course_attendance.class_name, xg_cloud.course_attendance.jc, xg_cloud.course_attendance.zc, xg_cloud.course_attendance.course_name, date_format(cast(xg_cloud.course_attendance.course_date, datetime(6) BINARY), %m月%d日)->Column#31 |
5 |
time:411ms, loops:2, Concurrency:OFF |
2.71 KB |
N/A |
|
└─Limit_14 |
root |
5 |
offset:35, count:5 |
5 |
time:411ms, loops:2 |
N/A |
N/A |
|
└─IndexLookUp_22 |
root |
40 |
|
40 |
time:411ms, loops:1, index_task: {total_time: 107.9ms, fetch_handle: 3.89ms, build: 10.4ms, wait: 93.6ms}, table_task: {total_time: 1.56s, num: 13, concurrency: 5} |
1.75 MB |
N/A |
|
├─IndexFullScan_19(Build) |
cop[tikv] |
50.03 |
table:ca, index:index_course_date(course_date), keep order:true, desc |
87220 |
time:937.2µs, loops:92, cop_task: {num: 1, max: 591.5µs, proc_keys: 0, rpc_num: 1, rpc_time: 578.7µs, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15}, tikv_task:{time:43ms, loops:90} |
N/A |
N/A |
|
└─Selection_21(Probe) |
cop[tikv] |
40 |
eq(xg_cloud.course_attendance.is_del, 0), eq(xg_cloud.course_attendance.is_submit, 1), like(xg_cloud.course_attendance.teacher_key, %20101065%, 92) |
82 |
time:1.52s, loops:24, cop_task: {num: 15, max: 370.8ms, min: 668.2µs, avg: 94ms, p95: 370.8ms, max_proc_keys: 20480, p95_proc_keys: 20480, tot_proc: 1.35s, tot_wait: 2ms, rpc_num: 29, rpc_time: 2.82s, copr_cache_hit_ratio: 0.07, distsql_concurrency: 15}, tikv_task:{proc max:346ms, min:0s, avg: 89.1ms, p80:187ms, p95:346ms, iters:125, tasks:15}, scan_detail: {total_process_keys: 66486, total_process_keys_size: 21619275, total_keys: 66503, rocksdb: {key_skipped_count: 19, block: {cache_hit_count: 523836}}} |
N/A |
N/A |
|
└─TableRowIDScan_20 |
cop[tikv] |
50.03 |
table:ca, keep order:false |
66742 |
tikv_task:{proc max:330ms, min:0s, avg: 85.9ms, p80:178ms, p95:330ms, iters:125, tasks:15} |
N/A |
N/A |
Have you tried prefix indexing? You can give it a try.
I haven’t tried it, could you give me some guidance?
There are these two filters in the execution plan: eq(xg_cloud.course_attendance.is_del, 0) and eq(xg_cloud.course_attendance.is_submit, 1). How much percentage can be filtered out approximately? If the number of filtered rows is small, try adding a combined index on is_del and is_submit, and then check again.