Creating a Web-Based BBS, Part 3

Mr. Lerner shows us how to add a full-text search to our BBS.

Over the last two months, we have looked at a simple bulletin-board system that can be incorporated into a web site. This BBS groups messages by subject or thread, and stores the messages in a relational database.

As we have seen in the last two installments of ATF, using a database for information storage and retrieval significantly reduces the amount of development work necessary to implement such a system. Given the ease with which I was able to implement the BBS, I decided to add several more features which can help users navigate their way through and use the BBS.

This month, we will look at how to implement several of these features. The most important is full-text search, which allows users to find interesting postings based on keywords. This saves them from having to search through a thread, which might not have an appropriate or easy-to-understand title. Then we will look at a tool that allows administrators to remove inappropriate postings without having to go into the guts of the database.

In Case You Just Tuned In

If you did not catch the last two installments of “At the Forge”, let's take a quick look at how the BBS is implemented. I used MySQL (see Resources), a relational database that has gained quite a bit of popularity among web programmers. Information in a relational database is stored in tables, in which the rows represent records and the columns represent fields.

We define the columns with SQL, the Structured Query Language that is a well-known standard for working with relational databases. Our BBS will contain two separate tables, ATFThreads (for keeping track of the individual threads, including the initial posting) and ATFMessages (for keeping track of individual messages). SQL is sent to a database server from a database client; this can be a programmatic client (e.g., a CGI program) or an interactive client (e.g., the mysql program that comes with MySQL). I normally use interactive clients for table creation, maintenance and debugging, but we can create our two tables with the following SQL:

CREATE TABLE ATFThreads (
id SMALLINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    subject VARCHAR(255) NOT NULL,
    author VARCHAR(60) NOT NULL,
    email VARCHAR(60) NOT NULL,
    text TEXT NOT NULL,
    date DATETIME NOT NULL,
    UNIQUE(subject)
)
CREATE TABLE ATFMessages (
    id MEDIUMINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    thread SMALLINT UNSIGNED NOT NULL,
    subject VARCHAR(60) NOT NULL DEFAULT
    "No subject",
    date DATETIME NOT NULL,
    author VARCHAR(60) NOT NULL DEFAULT
    "Mr. Nobody",
    email VARCHAR(60) NOT NULL DEFAULT
    "atf@lerner.co.il",
    text TEXT NOT NULL
)

If you enter the above into the interactive mysql program, you will want to put a semicolon (;) after each query to indicate that you want mysql to execute the query right away, rather than wait for additional input.

Once we have created the tables, we will have to write some programs—in our case, CGI programs—that manipulate the data. We will not write programs with SQL directly, but will create it within our CGI programs. Using Perl for such CGI programs is particularly easy once we use CGI.pm (the standard Perl module for CGI programs) and DBI (Perl's generic database interface).

The entire bulletin board system consists of about seven programs, each of which handles a different aspect of the system. You can download the programs from the FTP sites ftp.linuxjournal.com/pub/lj/listings/issue57/3193.tgz and ftp.linuxjournal.com/pub/lj/listings/issue58/3252.tgz Listings for this article will be in ftp.linuxjournal.com/pub/lj/listings/issue59/3296.tgz

Now that we have given the basic functionality a quick discussion, let's start to add some advanced functionality to our BBS.

Searching

The first new function we will add to the BBS is full-text searching. I have long been a fan of full-text search, both on the Web and elsewhere. According to Jakob Nielsen, probably the best-known web-usability researcher, many users are “search-dominant”. This means they prefer to search through a site for content, rather than traverse through trees of hyperlinks. (See Resources for more information on Nielsen, including the URL of his article.)

While the threaded structure of our BBS makes it relatively easy to find postings on a particular topic, undoubtedly there will be times when subject lines fail to reflect the actual content or when discussions veer into unexpected or unusual directions. Allowing users to search for words or phrases makes it easier for them to find what they want. Best of all, because the search functionality is in a separate program, it slows down the system only when someone uses it. If no one ever searches through the BBS, the system will not be slowed down.

MySQL allows for two kinds of searches through a table, with either SQL regular expressions or UNIX-style regular expressions. SQL regular expressions might seem silly to someone used to working with UNIX, but they are guaranteed to work on any database system that adheres to SQL standards. SQL regular expressions have two special characters: % (which matches zero or more characters) and _ (which matches exactly one character). To escape these special characters, you insert a leading backslash (\). To get a literal backslash, you insert two backslashes (\\).

To search for a match with SQL regular expressions, you use the LIKE operator within a SELECT statement. This returns all of the rows for which the regular expression finds a match. For instance:

SELECT text FROM ATFMessages WHERE text LIKE "a%"

will retrieve all message texts that begin with the letter a, followed by zero or more characters.

If you prefer UNIX-style regular expressions, MySQL allows you to use the REGEXP (or RLIKE) operator, as in

SELECT text FROM ATFMessages WHERE text RLIKE "a.*"

This will perform the same function as above.

Rather than force one system on all users, I decided to allow for both literal text and UNIX regular expressions. UNIX regular expressions are hard for most people to learn and understand, so the default is to allow for literal text searches. We perform a literal text search by using the LIKE operator, escaping the two SQL regular expression meta-characters with backslashes.

The search form itself (search-form.shtml, see Listing 1 in the archive file for an expanded version) is thus very short:

<P>Search for: <input type="text"
name="term"></P>
<input type="radio" name="regexp"
   checked value="no">Literal search
<input type="radio" name="regexp"
   value="yes">Use regular expressions
<input type="submit" value="Search!">

This form is submitted to the CGI program search.pl (Listing 2 in the archive file), which performs the actual searching. Search.pl is also fairly simple, although the SQL query is the most complex we have seen in this project. That's because we have to search through ATFMessages to find matches. We also need the message's thread ID number in order to create a hyperlink to view-thread.pl, which allows users to look at that thread.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState