SQL Best-of-Class Selections
April 12th, 2002 by Dan Wilder in
One may encounter a requirement where records in a table are to be grouped by the value of a column, and we need to select the record with the maximum or minimum value on some other column.
For example, suppose there is a database that contains a table with a record for each issue of a magazine sent under some subscription id, and we wish to find the most recent magazine sent for each subscription. The ID of a subscription is subs_id, and the unique key of the "sent" table is sent_id.
In a DBMS featuring subqueries, this is easy:
SELECT sent_id, subs_id FROM sent AS outer_sent WHERE sent_id = ( SELECT sent_id FROM sent WHERE subs_id = outer_sent.subs_id ORDER BY date_created DESC LIMIT 1);
This is called a "linked subquery" in which an input to the inner query is taken from the outer query. Since we are comparing records in the same table, we must use an alias for the table in either the inner or the outer query.
Conceptually, for each record in the outer query, we execute the inner query once, passing in the value of the grouping column, subs_id.
The inner query then considers all records having that subs_id in descending order by date created, returning sent_id, the key from the single record with latest date_created.
If that key matches the key of the record being considered by the outer query, the outer SELECT produces values from that record. Otherwise it goes on to the next.
In fact, a DBMS will optimize this but the optimization need not concern us, beyond noting that such a query executes a great deal faster than one might expect if the DBMS were literally proceeding as described.
The result: the newest record is selected for each class of "sent" records having the same subs_id.
This may be generalized to produce, say, the top five in each group, by changing the outer comparison from equality to membership and the LIMIT clause in the inner query:
SELECT sent_id, subs_id FROM sent AS outer_sent WHERE sent_id IN ( SELECT sent_id FROM sent WHERE subs_id = outer_sent.subs_id ORDER BY date_created DESC LIMIT 5);
The other extreme on the grouping field may be had by dropping the DESC keyword.
Dan Wilder is Technical Manager (a.k.a. Chief Geek) at Specialized Systems Consultants, Inc.
Special Magazine Offer -- 2 Free Trial Issues!
Receive 2 free trial issues of Linux Journal as well as instant online access to current and past issues. There's NO RISK and NO OBLIGATION to buy. CLICK HERE for offer
Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.
Sorry, offer available in the US only. International orders, click here.
Subscribe now!
The Latest
Featured Videos
Email is one of the least private and least secure forms of communication, although few people realize this. MixMaster is one way to allow secure, anonymous communication even over the very public medium of email. This tutorial will get you started with MixMaster quickly and easily.
In case you were wondering about the fun side of Linux World Expo, we thought we'd give you a peek at our shenanigans. We at Linux Journal love what we do so much, that we can't help but have a ball wherever we go.
Recently Popular
From the Magazine
September 2008, #173
Feeling a bit like a Thermian? Never give up, never surrender! Someday, you could go from underdog to top dog. Just take a look at a few of the underdogs we highlight in this issue: Mutt, djbdns, Nginix, Gentoo, Xara and the program voted mostly likely to fail just a few years back—Firefox. If Firefox is not radical enough for you, check out Chef Marcel's column for some more alternatives. Having trouble mapping your program data to your relational database? If so, Rueven Lerner shows you some tricks in his At The Forge column.
Need to run GUI applications on your server in the next state? In his Paranoid Penguin column, Mick Bauer shows you how to do it securely. Kyle Rankin keeps hacking and slashing and shows you a few split screen secrets you may not be familiar with. Finally, we all know what happens next February, but only Doc knows what happens afterward.
Delicious
Digg
Reddit
Newsvine
Technorati







linux
On October 2nd, 2007 garbage enigmatic encounter mp3 download (not verified) says:
Where and are asm/*.h? linux/*.h Thanks.
Re: SQL Best-of-Class Selections
On April 21st, 2002 Anonymous says:
Wouldn't this be a lot faster & simpler using GROUP BY?
i.e. (untested code):
SELECT sent_id, max(subs_id) FROM sent GROUP BY sent_id;
Re: SQL Best-of-Class Selections
On April 29th, 2002 Anonymous says:
or even better:
SELECT sent_id, subs_id
FROM sent
WHERE subs_id =
(SELECT MAX(subs_id)
FROM sent);
I'm not sure why the original author went to so much trouble over such a simple problem :)
Re: SQL Best-of-Class Selections
On April 17th, 2002 Anonymous says:
What dbms is this syntax for? It's a little different from the big boys I've worked with (Oracle & Sybase) and more advanced than MySQL.
Could this work as a self join? I think it would look something like:
SELECT sent_id, subs_id
FROM sent AS outer_sent, sent AS inner_sent
WHERE outer_sent.subs_id = inner_sent.subs_id
AND outer.date_created = max(inner.date_created)
ORDER BY date_created;
If it works it would be interesting to time the two and see which is faster. Usual rule I've found is that the subquery is always faster than the join query (YRMV).
Re: SQL Best-of-Class Selections
On March 11th, 2003 Anonymous says:
It's probably as close as you can get without using a subselect, but the join returns all results with the maximum date whereas the original solution returns one for each maximum date.
Also, the join is likely to be slow if date_created is not indexed.
PostgreSQL?
On April 21st, 2002 Anonymous says:
Perhaps the author wrote the SQL with PostgreSQL in mind? It supports sub-queries and is open-source...