In addition to the column definitions and the UNIQUE qualifier, our definition for the StationLines table included three INDEX lines—one for each of the station_id, line_id and north_to_south columns.
While it often helps to think of a relational database table as a glorified spreadsheet with rows and columns, there are some important differences. One is that a database table does not store its rows in any particular order. If we are interested in retrieving rows from the table in a certain order, we must specify it with the ORDER BY clause in our query.
Because rows are not ordered in any particular way, a SELECT query can often take quite a while to fulfill. For example, take the following query:
SELECT id FROM RailStations WHERE name = "Tel Aviv Central";
This might not seem like a time-consuming query, given that it involves a single table and a simple WHERE clause. But since the rows of RailStations are not stored in any particular order, finding the rows where the name is “Tel Aviv Central” can take quite a while. This might be a negligible amount of time in the case of a 100-row table, but when a table contains 1,000 or 10,000 rows, the time can become noticeable. In this particular example, the database server is probably smart enough to realize that RailStations.name has been declared UNIQUE, meaning our query will return one row, if it returns anything. This means the server will, on average, have to search through only half of the rows—but that can still take quite a while.
An index changes this picture by adding a pointer to each column value. If RailStations.name is indexed, the MySQL server can almost immediately find those rows containing a particular value. It can also determine whether a value exists at all.
If indexes can increase query speeds so dramatically, why are rows unindexed by default? The main answer is that indexes are written and updated each time an INSERT or UPDATE operation is performed on a table. Since the majority of database queries are SELECTs, in which the index can substantially improve performance, this is normally an acceptable trade-off. However, certain applications must INSERT and UPDATE at maximum speed, in which case creating an index can cause problems.
Since indexes are used in locating columns of a certain value, they are necessary for only those columns that will be named in WHERE clauses. There is no need to index a column that is displayed, but rarely used as a search criterion.
In some cases, it is enough to index the first part of each column rather than the entire column. For example, if we are indexing a column of type VARCHAR(50), then we might be able to index only 10 of those characters. This will retain most of the advantages of a full index (since the first ten characters are rarely identical in such a text field), while reducing the amount of information the index must store.
Now that we have thoroughly examined the tables describing the train system, it is time to put some trains on those tracks. The question of how to model this data is a tough one, since there are a number of ways in which to accomplish it. I decided to split this information into two tables, Trains and DepartureTimes.
Each row of Trains describes a particular train, indicating the line on which it runs, the ID numbers of its origin and destination stations, and the time it departs from its origin:
CREATE TABLE Trains ( id SMALLINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, line_id TINYINT UNSIGNED NOT NULL, origin_id TINYINT UNSIGNED NOT NULL, destination_id TINYINT UNSIGNED NOT NULL, depart_origin_time TIME NOT NULL, UNIQUE(line_id, origin_id, destination_id, depart_origin_time), INDEX(line_id), INDEX(origin_id), INDEX(destination_id), INDEX(depart_origin_time) );
The first column is a primary key, allowing us to describe each train with a single number. The combination of a rail line, origin, destination and hour should be unique, so we ask the database server to enforce this condition with the UNIQUE keyword.
Finally, we define the DepartureTimes table, which stores information on when a train will leave from a particular station:
CREATE TABLE DepartureTimes ( train_id SMALLINT UNSIGNED NOT NULL, station_id TINYINT UNSIGNED NOT NULL, departure_time TIME NOT NULL, INDEX(train_id), INDEX(station_id), INDEX(departure_time) );
Once we enter information into these tables, we can start to perform sophisticated queries. For example, which trains arrive at “Tel Aviv Central” before 8 a.m.?
SELECT train_id FROM DepartureTimes WHERE departure_time < "08:00" AND station_id = 4;Sure enough, this query returns a table containing two rows:
+----------+ | train_id | +----------+ | 1 | | 2 | +----------+Now we know two trains will arrive in Tel Aviv early enough for us to catch a morning meeting. But which trains are those? It would be nice to get more information than that. One possibility is to print the name of the origin station and the hour at which the train leaves:
SELECT S.name, T.depart_origin_time FROM DepartureTimes DT, Trains T, RailStations S WHERE DT.departure_time < "08:00" AND DT.station_id = 4 AND DT.train_id = T.id AND S.id = T.origin_id;Notice how SQL allows us to use < and > when handling dates and times, for columns declared as DATE, TIME or DATETIME. Given the contortions one must use in order to compare dates and times in nearly any programming language, this built-in date comparison is still one of my favorites.
Assuming we want to take the first train of the day (ID 1), we can print the list of when it will arrive at each station:
SELECT T.id, S.name, DT.departure_time FROM RailStations S, DepartureTimes DT, Trains T, StationLines SL WHERE T.id = DT.train_id AND T.id = 1 AND T.line_id = SL.line_id AND SL.station_id = DT.station_id AND DT.station_id = S.id ORDER BY T.id, SL.north_to_south ;
We can even print a full schedule for trains to Tel Aviv (ID 5):
SELECT T.id, S.name, DT.departure_time FROM RailStations S, DepartureTimes DT, Trains T, StationLines SL WHERE T.id = DT.train_id AND T.line_id = SL.line_id AND SL.station_id = DT.station_id AND DT.station_id = S.id AND T.destination_id = 5 ORDER BY T.id, SL.north_to_south ;Finally, we can retrieve a full schedule for trains to Tel Aviv (ID 5) that leave after 9 AM:
SELECT T.id, S.name, DT.departure_time FROM RailStations S, DepartureTimes DT, Trains T, StationLines SL WHERE T.id = DT.train_id AND T.line_id = SL.line_id AND SL.station_id = DT.station_id AND DT.station_id = S.id AND T.destination_id = 5 AND T.depart_origin_time > "09:00" ORDER BY T.id, SL.north_to_south ;
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- Google's SwiftShader Released
- My +1 Sword of Productivity
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- SuperTuxKart 0.9.2 Released
- Interview with Patrick Volkerding
- Parsing an RSS News Feed with a Bash Script
- Rogue Wave Software's Zend Server
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide