In addition to the column definitions and the UNIQUE qualifier, our definition for the StationLines table included three INDEX lines—one for each of the station_id, line_id and north_to_south columns.
While it often helps to think of a relational database table as a glorified spreadsheet with rows and columns, there are some important differences. One is that a database table does not store its rows in any particular order. If we are interested in retrieving rows from the table in a certain order, we must specify it with the ORDER BY clause in our query.
Because rows are not ordered in any particular way, a SELECT query can often take quite a while to fulfill. For example, take the following query:
SELECT id FROM RailStations WHERE name = "Tel Aviv Central";
This might not seem like a time-consuming query, given that it involves a single table and a simple WHERE clause. But since the rows of RailStations are not stored in any particular order, finding the rows where the name is “Tel Aviv Central” can take quite a while. This might be a negligible amount of time in the case of a 100-row table, but when a table contains 1,000 or 10,000 rows, the time can become noticeable. In this particular example, the database server is probably smart enough to realize that RailStations.name has been declared UNIQUE, meaning our query will return one row, if it returns anything. This means the server will, on average, have to search through only half of the rows—but that can still take quite a while.
An index changes this picture by adding a pointer to each column value. If RailStations.name is indexed, the MySQL server can almost immediately find those rows containing a particular value. It can also determine whether a value exists at all.
If indexes can increase query speeds so dramatically, why are rows unindexed by default? The main answer is that indexes are written and updated each time an INSERT or UPDATE operation is performed on a table. Since the majority of database queries are SELECTs, in which the index can substantially improve performance, this is normally an acceptable trade-off. However, certain applications must INSERT and UPDATE at maximum speed, in which case creating an index can cause problems.
Since indexes are used in locating columns of a certain value, they are necessary for only those columns that will be named in WHERE clauses. There is no need to index a column that is displayed, but rarely used as a search criterion.
In some cases, it is enough to index the first part of each column rather than the entire column. For example, if we are indexing a column of type VARCHAR(50), then we might be able to index only 10 of those characters. This will retain most of the advantages of a full index (since the first ten characters are rarely identical in such a text field), while reducing the amount of information the index must store.
Now that we have thoroughly examined the tables describing the train system, it is time to put some trains on those tracks. The question of how to model this data is a tough one, since there are a number of ways in which to accomplish it. I decided to split this information into two tables, Trains and DepartureTimes.
Each row of Trains describes a particular train, indicating the line on which it runs, the ID numbers of its origin and destination stations, and the time it departs from its origin:
CREATE TABLE Trains ( id SMALLINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, line_id TINYINT UNSIGNED NOT NULL, origin_id TINYINT UNSIGNED NOT NULL, destination_id TINYINT UNSIGNED NOT NULL, depart_origin_time TIME NOT NULL, UNIQUE(line_id, origin_id, destination_id, depart_origin_time), INDEX(line_id), INDEX(origin_id), INDEX(destination_id), INDEX(depart_origin_time) );
The first column is a primary key, allowing us to describe each train with a single number. The combination of a rail line, origin, destination and hour should be unique, so we ask the database server to enforce this condition with the UNIQUE keyword.
Finally, we define the DepartureTimes table, which stores information on when a train will leave from a particular station:
CREATE TABLE DepartureTimes ( train_id SMALLINT UNSIGNED NOT NULL, station_id TINYINT UNSIGNED NOT NULL, departure_time TIME NOT NULL, INDEX(train_id), INDEX(station_id), INDEX(departure_time) );
Once we enter information into these tables, we can start to perform sophisticated queries. For example, which trains arrive at “Tel Aviv Central” before 8 a.m.?
SELECT train_id FROM DepartureTimes WHERE departure_time < "08:00" AND station_id = 4;Sure enough, this query returns a table containing two rows:
+----------+ | train_id | +----------+ | 1 | | 2 | +----------+Now we know two trains will arrive in Tel Aviv early enough for us to catch a morning meeting. But which trains are those? It would be nice to get more information than that. One possibility is to print the name of the origin station and the hour at which the train leaves:
SELECT S.name, T.depart_origin_time FROM DepartureTimes DT, Trains T, RailStations S WHERE DT.departure_time < "08:00" AND DT.station_id = 4 AND DT.train_id = T.id AND S.id = T.origin_id;Notice how SQL allows us to use < and > when handling dates and times, for columns declared as DATE, TIME or DATETIME. Given the contortions one must use in order to compare dates and times in nearly any programming language, this built-in date comparison is still one of my favorites.
Assuming we want to take the first train of the day (ID 1), we can print the list of when it will arrive at each station:
SELECT T.id, S.name, DT.departure_time FROM RailStations S, DepartureTimes DT, Trains T, StationLines SL WHERE T.id = DT.train_id AND T.id = 1 AND T.line_id = SL.line_id AND SL.station_id = DT.station_id AND DT.station_id = S.id ORDER BY T.id, SL.north_to_south ;
We can even print a full schedule for trains to Tel Aviv (ID 5):
SELECT T.id, S.name, DT.departure_time FROM RailStations S, DepartureTimes DT, Trains T, StationLines SL WHERE T.id = DT.train_id AND T.line_id = SL.line_id AND SL.station_id = DT.station_id AND DT.station_id = S.id AND T.destination_id = 5 ORDER BY T.id, SL.north_to_south ;Finally, we can retrieve a full schedule for trains to Tel Aviv (ID 5) that leave after 9 AM:
SELECT T.id, S.name, DT.departure_time FROM RailStations S, DepartureTimes DT, Trains T, StationLines SL WHERE T.id = DT.train_id AND T.line_id = SL.line_id AND SL.station_id = DT.station_id AND DT.station_id = S.id AND T.destination_id = 5 AND T.depart_origin_time > "09:00" ORDER BY T.id, SL.north_to_south ;
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Happy GPL Birthday VLC!
- Unikernels, Docker, and Why You Should Care
- Handheld Emulation: Achievement Unlocked!
- Controversy at the Linux Foundation
- Giving Silos Their Due
- Don't Burn Your Android Yet
- Firefox OS
- Wine 1.8 Released