SQL: Speed difference when GROUP BY and WINDOW functions are equivalent

SQL Window Functions (archive) are more powerful than just Group By (archive) queries. Yet, they sometimes are equivalent.

Is there a speed difference when this is the case?

1. Test with a simple SQLite example

Consider the following orders table (here the first 10 rows):

orderid	customerid	total
1001	4308	25.52
1002	11683	35.33
1003	5676	30.79
1004	3097	77.6
1005	10374	109.04
1006	9241	15.13
1007	7189	11.02
1008	7228	33.11
1009	1125	17.19
1010	7340	4.6

We want to know the maximum total price that each customer paid.

2. Two ways to do that

We can either write a group-by query:

select
    customerid,
    max(total) as max
from orders
group by customerid;

or use a window function, and filtering the duplicated with distinct:

select distinct
    customerid,
    max(total) over (partition by customerid) as max
from orders;

The results are identical.

3. Efficiency

However, if we ask SQLite to explain the plan of execution for both queries (and there are no indexes), we get very different plans.

Here is the execution plan for the group-by query:

id	parent	notused	detail
6	0	0	SCAN orders
8	0	0	USE TEMP B-TREE FOR GROUP BY

And here is the execution plan for the window-function query:

id	parent	detail
3	0	CO-ROUTINE (subquery-2)
6	3	SCAN orders
14	3	USE TEMP B-TREE FOR ORDER BY
31	0	SCAN (subquery-2)
88	0	USE TEMP B-TREE FOR DISTINCT

It would be nice if SQLite would also show some cardinality and cost like Oracle SQL database does, but that’s enough to see the gist of it.

4. Why?

Window functions are designed to keep each row first, but you can decide to reduce them all later: this explains why we have a second SCAN happening.

Order by can only reduce the results, and so it can be done more efficiently, without having to keep some useless results for each row.

At some point, we could expect the query optimizer (archive) to optimize both to the same execution plan thanks so some equational reasoning.

Until then, it’s up to you to spot useless use of window functions where group by suffice.

Note	I used SQLite as a demo here, but this is not the only database engine to not perform this optimization.