Confusion about read_bytes

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: read_bytes 疑惑

| username: 等一分钟

【TiDB Usage Environment】Production Environment
【TiDB Version】6.1.7

The read_byte here: 7.92G, does it mean that such a large amount of data was queried from TiKV? But the memory usage doesn’t seem to be much.

| username: zhanggame1 | Original post link

There are multiple TiKVs, so the usage for each one is not much.

| username: 等一分钟 | Original post link

Actually, my psnTemp table only has 25 rows of data, and the result of the left join doesn’t have much data either. How could it be querying so much data?

| username: 等一分钟 | Original post link

The data volume is 7.92G, and the memory usage is not that large either.

| username: h5n1 | Original post link

When the ACD table was read back, 4.79 million rows of data were read.

| username: 等一分钟 | Original post link

Shouldn’t the psnTemp table join with the act table for 10,000 rows of data, and then return to the table?

| username: 等一分钟 | Original post link

Can any expert help clarify this?

| username: TiDBer_小阿飞 | Original post link

Look at the loops, the whole process looped so many times. If you calculate it using multiplication, you can roughly estimate how much was read, right? :grinning:

| username: 等一分钟 | Original post link

Isn’t the execution plan for this SQL in TiDB very poor?

| username: 有猫万事足 | Original post link

Use TiFlash. If aggregation cannot be pushed down, it won’t be fast no matter what.

| username: 等一分钟 | Original post link

The execution plan or optimizer is not good enough. Originally, there were only 10,000 rows of data after joining tables, but TiDB fetched all the data from the large table into memory.

| username: 等一分钟 | Original post link

Is the second hash join executed separately here?

| username: 等一分钟 | Original post link

Is the 5.57G memory usage for just this single operation, or is it the total usage for all the actions below it?

| username: 等一分钟 | Original post link

Is this maximum memory all occupied by the tidb_server node?

| username: heiwandou | Original post link

The optimizer is not good.

| username: tidb菜鸟一只 | Original post link

I think if you recollect the statistics for this table, the execution plan will be very different.

| username: 等一分钟 | Original post link

This is a temporary table that was just generated.

| username: 等一分钟 | Original post link

Is there an expert here? :sweat_smile:

| username: tidb菜鸟一只 | Original post link

The optimizer thinks that the psntemp table is larger than the aac table, so it first performs a hash join between the aac and acd tables. Therefore, I suggest you collect the statistics of the psntemp table, or you can specify a hint to let the psntemp and acd tables join first and see if it works.

| username: 等一分钟 | Original post link

The execution plan for the updated psntemp table’s statistics is the same.

How do you specify the hint to make psntemp and acd tables join first?