Questions about multi-table join queries in TiDB!

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 多表join查询疑问!

| username: wenyi

Multi-table join queries, each table has multiple regions, and the regions are distributed across multiple TiKV nodes.
I would like to ask:

  1. Do the regions of each table get transmitted from different TiKV nodes to TiDB, and the data association (i.e., condition filtering) is performed in TiDB rather than on the TiKV nodes?
  2. Is the condition pushed down so that each TiKV node filters the data according to the condition, and only the data that meets the condition is transmitted to TiDB for association?
| username: h5n1 | Original post link

Joins are performed in TiDB, and some filter conditions or aggregate functions can be pushed down to TiKV to reduce the amount of data sent to TiDB.

| username: wenyi | Original post link

For multiple tables with billions of rows in a join query, TiDB’s memory requirement is quite large, and even the maximum memory parameter configuration of 32GB per SQL is not sufficient.

| username: h5n1 | Original post link

Yes, it is possible to encounter an OOM (Out of Memory) issue. The specifics depend on the SQL and the execution plan.

| username: Kongdom | Original post link

I recently encountered this situation as well. Hash join is particularly slow, while index join is very fast :joy:

| username: Kongdom | Original post link

Push down as much as possible, and for AP queries that cannot be pushed down, delegate them to TiFlash.

| username: wenyi | Original post link

There are not many scenarios for pushdown, usually it involves ranges rather than aggregation.

| username: TiDB_C罗 | Original post link

If the data can be located in TiKV, it will definitely be sent down; if it cannot be located, it can only be filtered in TiDB.

| username: zhanggame1 | Original post link

The filtering conditions in the WHERE clause are mostly handled by TiKV. If it can be done in TiKV, it won’t be done in TiDB.

| username: TiDBer_jYQINSnf | Original post link

I read it for a long time but couldn’t understand what the question was about. The question seems to be a statement, and I didn’t see any errors. What are the people below answering?

| username: Kongdom | Original post link

You can find an example and post the explain analyze, so everyone can take a look together.

| username: TiDBer_jYQINSnf | Original post link

Teacher Kong, what is this question asking? It seems like I’m the only one who didn’t get it :stuck_out_tongue_closed_eyes:

| username: Kongdom | Original post link

He should be asking about the query logic, is it 1 or 2?

| username: TiDBer_jYQINSnf | Original post link

I misread the question, the one in 1

automatically ignored it :crazy_face:

During the data scanning phase, filtering will be done on TiKV, and only the data that meets the conditions will be sent up, executed according to the region.

| username: redgame | Original post link

TiDB uses query conditions to send requests to TiKV to retrieve data that meets the conditions.

| username: tidb菜鸟一只 | Original post link

Typically, TiKV filters the data first, then returns the filtered data to TiDB for association.

| username: cassblanca | Original post link

Why not perform predicate pushdown to share some of the computational load? The dbserver is under a lot of pressure. :smiley:

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.