Integrating SQL with CGI, Part 2

This month we learn additional ways to retrieve data from a relational database and ways to divide our data into multiple tables for maximum efficiency.
Joining two tables

Our database now contains two tables: “postcards”, with all of the information required to assemble a complete postcard (including the ID number of the graphic we wish to display), and “graphics”, which maps those ID numbers to the names.

How do we create an association between the two? We already know that it is possible to use SQL's SELECT command to retrieve one or more columns from a table. SELECT can also be used to retrieve columns from multiple tables and can even join columns together from them.

For instance, let's get the sender name, recipient address and graphic name from our two tables, such that the graphic name for each postcard is displayed immediately next to the unique ID and recipient address.

First, let's add another graphic file to the graphics table, just for good measure:

mysql> INSERT INTO graphics (
        VALUES (67890, bar.gif);
Query OK, 1 rows affected (0.00 sec)
mysql> select * from graphics;
2 rows in set (0.16 sec)
| graphic_id | graphic_file |
|      12345 | foo.gif      |
|      67890 | bar.gif      |

Let's now create a postcard to use for our example:

insert into postcards (id_number, sender_name,
        sender_email, recipient_name,
        recipient_email, postcard_text, graphic_id)
VALUES (99999, "Bill Clinton",
        "Al Gore",
        "Please call. I have a new tax idea.",
Notice that because they are integers, neither the postcard ID number nor the graphic ID number is surrounded by quotation marks. Also, notice how the initial list of columns in our INSERT command includes graphic_id, but leaves out graphic_text. Since graphic_text in postcards can contain null values, we can ignore it when inserting new rows into the table.

Now let's retrieve the sender name, recipient name and graphic file name for the postcard we just created:

mysql> select postcards.sender_name,
        -> from postcards,graphics
        -> where postcards.id_number = 99999;
        2 rows in set (0.31 sec)
| sender_name  | recipient_name | graphic_file |
| Bill Clinton | Al Gore        | foo.gif      |
| Bill Clinton | Al Gore        | bar.gif      |

Egad—this isn't what we wanted at all. We wanted it to bring up the file name for the ID we specified, not the combination of all rows and columns in the “graphics” table with our postcard. That's what we wanted to do, but that's not what we told the computer to do. By formulating our SQL query as above, we inadvertently asked for all possible combinations of rows from postcards with rows from graphics, this combination is known in database circles as the “Cartesian product” of the two. While asking for a Cartesian product is not an error, it is almost always undesirable. When dealing with especially large tables, asking for such a combination of tables can result in a long, unnecessary computation that ties up the database server keeping it from performing other tasks.

How can we modify our database query so that it does what we originally wanted, namely, giving us the graphics file name in place of its ID number in the postcards table? The simplest way is to set up a restriction between the two tables, adding to the WHERE clause in the query, as follows:

mysql> select postcards.sender_name,
        -> from postcards,graphics
        -> where postcards.id_number = 99999
        -> and postcards.graphic_id =
        1 rows in set (0.10 sec)
| sender_name  | recipient_name | graphic_file |
| Bill Clinton | Al Gore        | foo.gif      |

Now that's more like it. By requiring the equivalence between the graphic_id columns in both tables, we retrieved the information as if it came from a single table. That's part of the magic of SQL and relational databases. By combining tables in this way, you can make your data easier to handle by putting it in separate tables. However, when you retrieve the information, no one knows it came from separate tables, since a new, temporary table is returned to the caller.

This may seem like a silly example, but imagine a corporation with very strict pay scales whose payroll is handled by a relational database. If you give every employee an ID number indicating salary, you can give everyone a raise (or a reduction, depending on the company's financial state) by updating records in a salary table, rather than the table of employees. The next time you perform a join between the employee and salary tables, the new salary will be reflected automatically.


Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
On Demand
Moderated by Linux Journal Contributor Mike Diehl

Sign up and watch now

Sponsored by Skybot