Portable Database Management with /rdb

Web server analysis logs and mailing list management is made easy by using the /rdb database system—here's how to do it.
/rdb Filters

Most /rdb commands operate on tables that are read through stdin. The resulting table is written to stdout. Here are a few examples:

row 'Cost < 50' inventory

selects the rows in the inventory table where the Cost field is less than 50.

column 'Description' < inventory
reads the inventory table and outputs a table containing only the column Description.
addcol 'Value' < inventory
reads inventory and outputs a new table that is identical to inventory except for the addition of a new column called Value. At this point, all of the fields in the Value column are initialized to empty strings.
compute 'Value = Cost * Quantity'
reads an input table and, on each row, sets Value to Cost * Quantity. If Value is a column name, the computed value is stored in the table; otherwise, it could still be referenced in other statements within the same compute statement.

Putting these commands together, the Bourne shell script in Listing 1 will output a table describing inventory items which cost less than $50 US and showing the value of each.

These commands are only a few of the filters that /rdb provides. For a complete listing, see the on-line man pages listed in Resources.

The row and compute filters are actually front ends for awk. Since awk doesn't understand field names as defined by /rdb, these commands convert field names into column numbers which awk does understand. As a result, the user who is familiar with awk can immediately use these commands and many others provided by /rdb.

/rdb Table Editing

Because /rdb tables are ASCII files, they can be created and maintained by any text editor. This will work in many cases but will be awkward in others, in particular, with wide tables or those containing a large number of columns. /rdb provides two table editors, ve and jve for this purpose. A complete description of these editors is beyond the scope of this article, but the following is a partial list of the capabilities provided:

  • Forms based data entry: Screen files, ASCII files describing data-entry screen layout, can be specified using the -s option and are used to associate screen field positions with table column names and to provide initialization and write protection for certain fields.

  • Data validation: The -v option is used to specify a validation file to provide data edits based on allowed and disallowed ranges of characters, column length limits, table lookup and command-based validation. With command-based validation, the values can be passed to any arbitrary command or shell script. The exit status indicates to jve whether or not the data is considered valid (zero for success, non-zero for failure). If a non-zero status is received, jve will beep and display the first line of standard error at the bottom of the screen.

  • Audit file creation and maintenance: An audit file tracks changes to a table and can be used by the rollback command to restore a table to a specific point in time.

Web Server Log Analysis

In order to illustrate how /rdb operates in conjunction with other Unix shell commands, the script in Listing 2 shows how a log file can be analyzed to report a server's top five accessors. This script assumes that stdin is a log file with space delimited fields with the host name as the first field (this is typical for Apache and several other web servers). The following is a step by step description of the process:

  1. The combination of awk and the header command produces a table with host names in order of their appearance in the log file. The output from awk is not in /rdb table format, so header is used to complete the production of the table.

  2. The addcol command followed by compute is used to add a column named n and initialize its value to 1.

  3. The table is then sorted by host and the n values are totaled using host as a control break to produce a table of the host names and the total access counts for each, i.e., the value of n.

  4. Since we are interested in seeing the five hosts with the highest access counts, the table produced in step 3 is sorted by n.

  5. The top seven lines from the table in step 4 (two header lines plus five rows) are filtered out by the head command and passed to the justify command which vertically aligns the header and columns and produces a format better suited to printing.

The script in Listing 2 produces the following output:

host n
wilbur.leba.net 106
gateway.amat.com        24    20
ras3.fsxnet.com 14
ohnp6.m50.nrc.ca        14

Actually, it wasn't necessary to create and store any /rdb tables.