TiDB's performance for subquery support is poor, seems worse than MySQL

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb对子查询支持性能不好,感觉比mysql性能差

| username: wenyi

[Test Environment for TiDB]
[TiDB Version] 7.1.1, MySQL 5.7
[Resource Configuration]
TiDB: 3 TiKV nodes, 3 TiDB nodes, each with 32 cores and 128GB of RAM
MySQL: 1 node, 4 cores, 8GB of RAM
The table data volume is the same, and it is the same table.
MySQL query result: 4 seconds, TiDB query result: 144 seconds



Their execution plans are as follows:

| username: xfworld | Original post link

How much data is in the table?

| username: wenyi | Original post link

6 million

| username: dba-kit | Original post link

Try using the NO_DECORRELATE() hint to see if it has any effect.
PS: This hint should be placed inside the subquery.

| username: zhanggame1 | Original post link

After reading the documentation, this hint is for using the apply operator, but the execution plan indeed used the apply operator.

| username: h5n1 | Original post link

Could you upload the result of explain analyze?

| username: tidb菜鸟一只 | Original post link

Drag this out a bit to see which two indexes are used in TiDB.

| username: cy6301567 | Original post link

Have you analyzed it? Did it use the optimal index? If you are performing multi-table join and aggregation operations, you can use TiFlash, it’s very fast.

| username: redgame | Original post link

How much data is returned?

| username: wenyi | Original post link


Untitled.xls (12 KB)

| username: wenyi | Original post link

Uploaded the Excel version of the execution plan for everyone to read carefully.
image
Adding any hint has no optimization effect.

| username: tidb菜鸟一只 | Original post link

The one you uploaded is using TiFlash.

| username: wenyi | Original post link

Automatically choose TiFlash, because my table has TiFlash replicas. In theory, using TiFlash shouldn’t have such poor performance.

| username: Kongdom | Original post link

When performing aggregate queries, TiFlash is extremely fast, but it also depends on the application scenario.

| username: wenyi | Original post link

I forced it to use TiKV, but the performance is still poor. I’ll upload the execution plan later.

| username: h5n1 | Original post link

How long did your explain analyze take to execute? The actual execution used TiFlash, while the previous explain used TiKV. Try adding this hint to your SQL: /+ READ_FROM_STORAGE(TIKV[ca,aa])/

| username: wenyi | Original post link

This is the execution plan for TiKV, and it takes about 150 seconds even with TiKV.

| username: h5n1 | Original post link

User, try the HASH_JOIN(t1_name [, tl_name …]) HINT.

| username: cassblanca | Original post link

I seriously suspect that the storage used by TiDB is too poor. Scanning the index of 6 million records takes nearly 1 minute.

| username: gcworkerishungry | Original post link

You can also check the network situation.