Can too many visible versions cause queries to slow down?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 可见版本数过多会导致查询变慢么?

| username: Kongdom

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.1.0

A simple aggregate query can hit the index, but the query time exceeds 1 minute. The GC setting is 24h.
Can too many replicas cause slow queries?

select aa, sum(c_amount) from table 
where c_date > '2023-05-01' 
and c_date < '2023-07-01'  
group by aa
order by aa
limit 10

| username: xfworld | Original post link

Yes, when reading from a large amount of data, needing to skip can slow down the get speed.

If necessary, you can manually release the version.

| username: Kongdom | Original post link

I also found another problem, why was hashagg used twice :sweat_smile:

| username: xfworld | Original post link

Once on TiKV, once on TiDB

Each TiKV node gets the grouping results, then aggregates them on TiDB for grouping and sorting…

This ensures the correct results are obtained. (After all, the data is distributed across various TiKV nodes and cannot be grouped and aggregated in one go.)

| username: Kongdom | Original post link

So why did it perform an IndexRangeScan once and then a TableRowIDScan once?

| username: zhanggame1 | Original post link

Of course, it will be slower. TiDB’s MVCC puts historical data and current data together.

| username: redgame | Original post link

It has an impact.

| username: zhanggame1 | Original post link

TableRangeScan: Table data scan with a range.
TableRowIDScan: Scans table data based on the RowID passed down from the upper layer. Often used to retrieve rows that meet the criteria after an index read operation.

Specifically, it means finding the specific rowid through the index on the c_date column, and then going back to the table to fetch data using the rowid.

With 13 million data entries and no garbage collection, it won’t be fast.

| username: 裤衩儿飞上天 | Original post link

Table lookup.

| username: Kongdom | Original post link

:rofl: I forcibly used TiFlash, and the time reduced from 1 minute to 7 seconds. It seems that I still have to forcibly use TiFlash.

| username: 有猫万事足 | Original post link

The first time is group by, the second time is limit. After hashagg, immediately topn, which is limit.

| username: Kongdom | Original post link

:wink: Got it, we’ve decided to use TiFlash.

| username: DBRE | Original post link

How to manually release a version?

| username: xfworld | Original post link

Logically, the data has been deleted, and this operation must be performed.

Then, manually control the GC of TiKV to achieve the release.

Reference documentation:

| username: DBRE | Original post link

Got it~ :+1:

| username: Kongdom | Original post link

:call_me_hand: :call_me_hand: :call_me_hand:

| username: cy6301567 | Original post link

Is there a composite index on date and aa?

| username: Kongdom | Original post link

No, only c_date has an index.

| username: cy6301567 | Original post link

Try composite indexes.

| username: Kongdom | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.