SQL Window Functions (archive) are more powerful than just Group By (archive) queries. Yet, they sometimes are equivalent.

Is there a speed difference when this is the case?

1. Test with a simple SQLite example

Consider the following orders table (here the first 10 rows):

orderid customerid total

1001

4308

25.52

1002

11683

35.33

1003

5676

30.79

1004

3097

77.6

1005

10374

109.04

1006

9241

15.13

1007

7189

11.02

1008

7228

33.11

1009

1125

17.19

1010

7340

4.6

We want to know the maximum total price that each customer paid.

2. Two ways to do that

We can either write a group-by query:

select
    customerid,
    max(total) as max
from orders
group by customerid;

or use a window function, and filtering the duplicated with distinct:

select distinct
    customerid,
    max(total) over (partition by customerid) as max
from orders;

The results are identical.

3. Efficiency

However, if we ask SQLite to explain the plan of execution for both queries (and there are no indexes), we get very different plans.

Here is the execution plan for the group-by query:

id parent notused detail

6

0

0

SCAN orders

8

0

0

USE TEMP B-TREE FOR GROUP BY

And here is the execution plan for the window-function query:

id parent notused detail

3

0

0

CO-ROUTINE (subquery-2)

6

3

0

SCAN orders

14

3

0

USE TEMP B-TREE FOR ORDER BY

31

0

0

SCAN (subquery-2)

88

0

0

USE TEMP B-TREE FOR DISTINCT

It would be nice if SQLite would also show some cardinality and cost like Oracle SQL database does, but that’s enough to see the gist of it.

4. Why?

Window functions are designed to keep each row first, but you can decide to reduce them all later: this explains why we have a second SCAN happening.

Order by can only reduce the results, and so it can be done more efficiently, without having to keep some useless results for each row.

At some point, we could expect the query optimizer (archive) to optimize both to the same execution plan thanks so some equational reasoning.

Until then, it’s up to you to spot useless use of window functions where group by suffice.

Note
I used SQLite as a demo here, but this is not the only database engine to not perform this optimization.