Slow Query Issues with Large Text Fields (LONGTEXT)

translator_bot · June 21, 2024, 12:20pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 慢查询大文本字段longtext查询问题

| username: Dais

[TiDB Usage Environment] Testing
[TiDB Version] V7.1.2
[Reproduction Path]
Table T TS_DOCKPF_RECORD Data Volume: 5581 rows Table T1 TS_DOCKPF Data Volume: 22 rows

In table T, the fields RECORD_REQ and RECORD_RES are of longtext type. The content stored in these two fields for each row is approximately 180kb.

Executing the following SQL:
EXPLAIN ANALYZE SELECT
T.DOCKPF_ID AS “fkByDockpfId.rid”,
T.BUS_TYPE AS busType,
T.RECORD_UID AS recordUid,
T.RECORD_REQ AS recordReq,
** T.RECORD_RES AS recordRes,**
T.DOCK_CONTENT AS dockContent,
T.RECORD_STATE AS recordState,
T.RECORD_TIME AS recordTime,
T.RID AS rid,
T.IF_DELETE AS ifDelete
FROM
TS_DOCKPF_RECORD T
LEFT JOIN TS_DOCKPF T1 ON T1.RID = T.DOCKPF_ID
WHERE
T1.UID = ‘78AA82AA0C01474C88238161234567890’
AND T.BUS_TYPE = 31
AND T.RECORD_STATE = 0
AND T.IF_DELETE = 0
ORDER BY
T.MODIFY_TIME ASC,
T.RID ASC
LIMIT
10;

Execution time: 30s+

If the two large fields are removed from the query result fields, the SQL is as follows:
EXPLAIN ANALYZE SELECT
T.DOCKPF_ID AS “fkByDockpfId.rid”,
T.BUS_TYPE AS busType,
T.RECORD_UID AS recordUid,
T.DOCK_CONTENT AS dockContent,
T.RECORD_STATE AS recordState,
T.RECORD_TIME AS recordTime,
T.RID AS rid,
T.IF_DELETE AS ifDelete
FROM
TS_DOCKPF_RECORD T
LEFT JOIN TS_DOCKPF T1 ON T1.RID = T.DOCKPF_ID
WHERE
T1.UID = ‘78AA82AA0C01474C88238161234567890’
AND T.BUS_TYPE = 31
AND T.RECORD_STATE = 0
AND T.IF_DELETE = 0
ORDER BY
T.MODIFY_TIME ASC,
T.RID ASC
LIMIT
10;
Execution time: 30ms

Execution plans for both queries
bbb.sql (27.2 KB)

For the long execution time of such large field queries, could the experts please advise on how to optimize?

translator_bot · June 21, 2024, 12:20pm

| username: 小龙虾爱大龙虾 | Original post link

Two approaches:

If the order by field in the SQL can be removed, then remove it; it should be faster.
Based on the fast performance of SQL2, write scalar subqueries in the select fields, and then use the table’s primary key to look up the two large fields in table T.

translator_bot · June 21, 2024, 12:20pm

| username: 小王同学Plus | Original post link

You need to specifically check the slow log. If there are longtext large fields, the parser might be slow.
These two execution plans are actually the same.

translator_bot · June 21, 2024, 12:20pm

| username: 像风一样的男子 | Original post link

Your data is too large, most of the consumption is in data transmission. Whoever designed to store such large data should be taken out and shot.

translator_bot · June 21, 2024, 12:20pm

| username: forever | Original post link

The two execution plans look the same, but if you look at the execution info scan_detail of the Selection_25(Probe) operator, the number of keys scanned is many times more without large fields. This is a bit abnormal. I checked the documentation and couldn’t find the storage method for large fields. Does anyone know if there is any difference between this and normal storage?

translator_bot · June 21, 2024, 12:20pm

| username: forever | Original post link

Wrap it with another layer. Use the current result set of the primary key from table T as table C, then join table T to fetch the required data and try it out (first query the primary keys with limit 10, then fetch the 10 rows of data).

translator_bot · June 21, 2024, 12:20pm

| username: tidb菜鸟一只 | Original post link

I suggest you try the two SQL statements again with SQL_NO_CACHE. I feel that the cache is interfering.

translator_bot · June 21, 2024, 12:20pm

| username: Dais | Original post link

Slow log with large field queries

translator_bot · June 21, 2024, 12:20pm

| username: TiDBer_gxUpi9Ct | Original post link

The data is too large.

translator_bot · June 21, 2024, 12:20pm

| username: Dais | Original post link

Removing the order by field in SQL does not significantly improve the speed. The issue still lies with the large fields.
Our current modification plan also involves using PK to backtrack and query the T table for the limit 10 data. We are just wondering if there is a more optimal solution.

translator_bot · June 21, 2024, 12:20pm

| username: 像风一样的男子 | Original post link

You can see which stage is slow in the dashboard slow query.

translator_bot · June 21, 2024, 12:20pm

| username: 小龙虾爱大龙虾 | Original post link

Why don’t you take a look at both execution plans?

translator_bot · June 21, 2024, 12:20pm

| username: Dais | Original post link

Execution plans with large fields, without large fields, without sorting, and without back table lookups.
Execution Plan.sql (56.7 KB)

translator_bot · June 21, 2024, 12:20pm

| username: 小龙虾爱大龙虾 | Original post link

Well, removing the order by in this scenario is indeed not very useful, because after filtering TS_DOCKPF_RECORD, there is only one record left.

translator_bot · June 21, 2024, 12:20pm

| username: dba远航 | Original post link

Querying large fields is not fast and memory usage is high.

translator_bot · June 21, 2024, 12:20pm

| username: TiDBer_小阿飞 | Original post link

There are several optimization ideas for large field types of data, with the ultimate goal of reducing memory usage and improving hit rates:

Universal solution: Indexing. Add an index to this field. For long fields, consider prefix indexing or full-text indexing.
If you only query part of the content in this long field, you can consider using pagination queries to reduce query time.
Query optimization: See if the SQL can be further optimized to avoid full table scans.
Database sharding: Consider sharding the database to avoid full table scans.
Cache technology: Consider using Redis, Memcached, etc., to cache the results, reducing database access and saving resource consumption.

translator_bot · June 21, 2024, 12:20pm

| username: TiDBer_gxUpi9Ct | Original post link

High memory usage

translator_bot · June 21, 2024, 12:20pm

| username: andone | Original post link

The data volume is too large, I suggest optimizing it.