SQL Best-of-Class Selections
One may encounter a requirement where records in a table are to be grouped by the value of a column, and we need to select the record with the maximum or minimum value on some other column.
For example, suppose there is a database that contains a table with a record for each issue of a magazine sent under some subscription id, and we wish to find the most recent magazine sent for each subscription. The ID of a subscription is subs_id, and the unique key of the "sent" table is sent_id.
In a DBMS featuring subqueries, this is easy:
SELECT sent_id, subs_id FROM sent AS outer_sent WHERE sent_id = ( SELECT sent_id FROM sent WHERE subs_id = outer_sent.subs_id ORDER BY date_created DESC LIMIT 1);
This is called a "linked subquery" in which an input to the inner query is taken from the outer query. Since we are comparing records in the same table, we must use an alias for the table in either the inner or the outer query.
Conceptually, for each record in the outer query, we execute the inner query once, passing in the value of the grouping column, subs_id.
The inner query then considers all records having that subs_id in descending order by date created, returning sent_id, the key from the single record with latest date_created.
If that key matches the key of the record being considered by the outer query, the outer SELECT produces values from that record. Otherwise it goes on to the next.
In fact, a DBMS will optimize this but the optimization need not concern us, beyond noting that such a query executes a great deal faster than one might expect if the DBMS were literally proceeding as described.
The result: the newest record is selected for each class of "sent" records having the same subs_id.
This may be generalized to produce, say, the top five in each group, by changing the outer comparison from equality to membership and the LIMIT clause in the inner query:
SELECT sent_id, subs_id FROM sent AS outer_sent WHERE sent_id IN ( SELECT sent_id FROM sent WHERE subs_id = outer_sent.subs_id ORDER BY date_created DESC LIMIT 5);
The other extreme on the grouping field may be had by dropping the DESC keyword.
Dan Wilder is Technical Manager (a.k.a. Chief Geek) at Specialized Systems Consultants, Inc.










This week 5 lucky Members will receive a Root Superhero T-shirt, as modeled by
Hack Editor Kyle Rankin. No entry necessary. Check back here early next week
to find out who the lucky Online Members are.




Comments
Re: SQL Best-of-Class Selections
Wouldn't this be a lot faster & simpler using GROUP BY?
i.e. (untested code):
SELECT sent_id, max(subs_id) FROM sent GROUP BY sent_id;
Re: SQL Best-of-Class Selections
or even better:
SELECT sent_id, subs_id
FROM sent
WHERE subs_id =
(SELECT MAX(subs_id)
FROM sent);
I'm not sure why the original author went to so much trouble over such a simple problem :)
Re: SQL Best-of-Class Selections
What dbms is this syntax for? It's a little different from the big boys I've worked with (Oracle & Sybase) and more advanced than MySQL.
Could this work as a self join? I think it would look something like:
SELECT sent_id, subs_id
FROM sent AS outer_sent, sent AS inner_sent
WHERE outer_sent.subs_id = inner_sent.subs_id
AND outer.date_created = max(inner.date_created)
ORDER BY date_created;
If it works it would be interesting to time the two and see which is faster. Usual rule I've found is that the subquery is always faster than the join query (YRMV).
Re: SQL Best-of-Class Selections
It's probably as close as you can get without using a subselect, but the join returns all results with the maximum date whereas the original solution returns one for each maximum date.
Also, the join is likely to be slow if date_created is not indexed.
PostgreSQL?
Perhaps the author wrote the SQL with PostgreSQL in mind? It supports sub-queries and is open-source...
Post new comment