SQL Best-of-Class Selections
One may encounter a requirement where records in a table are to be grouped by the value of a column, and we need to select the record with the maximum or minimum value on some other column.
For example, suppose there is a database that contains a table with a record for each issue of a magazine sent under some subscription id, and we wish to find the most recent magazine sent for each subscription. The ID of a subscription is subs_id, and the unique key of the "sent" table is sent_id.
In a DBMS featuring subqueries, this is easy:
SELECT sent_id, subs_id FROM sent AS outer_sent WHERE sent_id = ( SELECT sent_id FROM sent WHERE subs_id = outer_sent.subs_id ORDER BY date_created DESC LIMIT 1);
This is called a "linked subquery" in which an input to the inner query is taken from the outer query. Since we are comparing records in the same table, we must use an alias for the table in either the inner or the outer query.
Conceptually, for each record in the outer query, we execute the inner query once, passing in the value of the grouping column, subs_id.
The inner query then considers all records having that subs_id in descending order by date created, returning sent_id, the key from the single record with latest date_created.
If that key matches the key of the record being considered by the outer query, the outer SELECT produces values from that record. Otherwise it goes on to the next.
In fact, a DBMS will optimize this but the optimization need not concern us, beyond noting that such a query executes a great deal faster than one might expect if the DBMS were literally proceeding as described.
The result: the newest record is selected for each class of "sent" records having the same subs_id.
This may be generalized to produce, say, the top five in each group, by changing the outer comparison from equality to membership and the LIMIT clause in the inner query:
SELECT sent_id, subs_id FROM sent AS outer_sent WHERE sent_id IN ( SELECT sent_id FROM sent WHERE subs_id = outer_sent.subs_id ORDER BY date_created DESC LIMIT 5);
The other extreme on the grouping field may be had by dropping the DESC keyword.
Dan Wilder is Technical Manager (a.k.a. Chief Geek) at Specialized Systems Consultants, Inc.