Optimizing Slow Queries for Large Tables and Index Optimization: How to Improve Table Query Efficiency?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 大表慢查询优化,索引优化。如何提高表查询效率?

| username: zhimadi

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.2
[Reproduction Path] User queries business through the system, and the database responds slowly.
[Problem Encountered: Phenomenon and Impact]
The project is a SaaS system, with each table distinguished by a tenant ID. Additionally, the business in the system is quite complex, often using multi-table join queries.
Problem encountered: Business tables with around 100 million rows of data are slow even for single-table queries. Typically, it takes about 2-3 seconds for a customer to query one month’s worth of documents.
For example: t1 is the sales order table, indexed by time and tenant ID.
select column1, column2… from t1 where time and tenant ID;
There is no issue for tenants with a small amount of data, but when a tenant with over tens of millions of data queries historical documents, the above statement uses the tenant ID index and responds very slowly. With only 100 million data, it shouldn’t be this slow.
I have tried partitioning the table by time or tenant ID, but the effect is not very good. Moreover, I personally think that since it is a distributed database, is it correct to use partitioned tables to improve query efficiency?
Could the experts please advise on how to handle this? Are there any other solutions?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: 托马斯滑板鞋 | Original post link

Are you using TiFlash? I checked the documentation, and it seems that version 5.4 supports aggregate functions like sum.

| username: zhimadi | Original post link

Without TiFlash, what does it mean to support aggregate functions like sum?
Also, does it support creating conditional indexes? Similar to PostgreSQL.

| username: 托马斯滑板鞋 | Original post link

TiFlash has columnar replicas (tables are indexes), which you can understand as a columnar database. This greatly improves performance for batch queries on large tables.

| username: xingzhenxiang | Original post link

Try creating a composite index based on the query. Not sure if it will work.

| username: h5n1 | Original post link

Find an SQL slow query execution plan and post the following information.

| username: zhimadi | Original post link

Attached image:

| username: h5n1 | Original post link

The one you posted is also not: select column1, column2… from t1 where time and tenant ID this kind of SQL. You can copy the execution plan text so that we can see the information later.

| username: tidb菜鸟一只 | Original post link

Seeing so many SUM statements, I suggest directly using TiFlash. The effect will definitely be excellent.

| username: zhimadi | Original post link

Here is the example sentence I posted,
Can I privately send you the execution plan text after copying it?

| username: zhimadi | Original post link

What is the typical configuration for TiFlash? How many machines are needed to start?

| username: tidb菜鸟一只 | Original post link

This depends on the size of your data. You should first consider whether your current large tables can no longer handle statistical SQL with TiKV. Consider how many resources are needed to support these large tables. My suggestion is to first deploy a test environment, migrate some data from the largest and slowest statistical SQL to it, and compare the efficiency of using TiKV and TiFlash. See how many TiFlash nodes are needed to meet your requirements compared to the current TiKV nodes.

| username: h5n1 | Original post link

What type of disk do you have?

| username: 魔礼养羊 | Original post link

I’m not sure if my method is effective. Ultimately, SQL optimization boils down to three approaches: trading space for time, considering environmental factors, and summarizing as follows:

  1. Establish a data warehouse. If users have fixed requirements, can the data tables be modeled and solidified into reports, such as monthly thematic reports? This way, queries only need to access these reports. The downside is that if the requirements are temporary or sudden, creating reports is not very meaningful.
  2. Add a Redis cache component to cache specific needs if the data is frequently used.
  3. Switch to TiFlash columnar database storage. Considering you are querying columns, I estimate the optimization probability is high. You can try it in a test environment first.
  4. Is there a possibility of data skew? For example, you partition and shard the database, but the tables actually reside on a limited number of three or four servers. Check if there are any data maintenance tools to inspect this.
  5. Resource interference. Check the server’s CPU/memory/network and compare the situation before and after executing the query. If the resource utilization (especially memory) is already high before the query, it might be due to insufficient resources.

Personal suggestion: When you encounter a problem, it’s best to create a mind map and then use simple binary (pros and cons) or ternary (left, middle, right) methods for exhaustive analysis. In data management issues, thinking generally precedes technology.

| username: 人如其名 | Original post link

It indicates that there is still room for optimization in your index. An index read with 180,000 rows returning to the table is already quite fast. Adding tdate to the index will make it even faster.

| username: zhimadi | Original post link

I’m not very familiar with TiFlash. Can some of the statistical SQL be migrated there? Does that mean we need to write additional business code for queries?

| username: zhimadi | Original post link

ESSD cloud disk 11800 IOPS, single disk IOPS performance limit of 50,000

| username: zhimadi | Original post link

tdate is currently a separate index.

| username: h5n1 | Original post link

Check the disk performance.

| username: zhimadi | Original post link

The suggestions are very detailed and have many thought-provoking points. Thank you.