Building a Linux-Based Appliance

HOWTOs

by Jed Stafford

on October 10, 2002

Have you ever solved the same system administration problem for many clients and wished you didn't have to reinvent the wheel every time? Or had the desire to build your own appliance but not known how? A recent consulting project gave us the incentive we needed to build our own appliance. By sharing the technical and business challenges we encountered and the solutions we implemented, we hope to offer some insight that will help you bring your own Linux-based appliance to market.

Our clients ask us to do a wide variety of IT projects, from setting up e-mail to implementing firewalls and VPN configurations. On a recent project, a customer asked us to look into the company's existing firewall configuration. They had a Cisco PIX firewall in place, but it was using an outdated version of the Cisco software. Given the costs involved in adding VPN support and purchasing the upgrade, they asked us to review with them the other firewall options on the market, including Check Point FW-1 and a Linux IPTables/IPSec solution. Based on their business requirements, they ultimately decided to go with a Check Point FW-1 firewall and VPN solution.

In implementing security solutions for other customers, and solving similar issues for each implementation, we had developed the idea of building a standalone firewall appliance. But we had not yet worked with the right customer to make an implementation possible. What finally made the decision to build an appliance easy was this particular customer's willingness to beta test the product.

In previous implementations we had developed some simple shell script-based tools to help automate common tasks for our customers and enhance the functionality of existing vendor-supplied tools. But as we developed the security appliance, we realized these shell scripts were simply not sufficient for a commercial product. As we developed a more advanced set of tools, we created a number of product features that should be useful for any appliance, not only a security appliance.

Product Requirements

With our goal in mind, we put together the following set of product requirements. We knew our customers would require a true standalone box in which all administration functionality would be completely self-contained and provided by the appliance itself. There should be no need for a separate Windows- or Solaris-based client (e.g., the existing Check Point tools.) Moreover, the configuration software we provided would have to offer significant enhancements to the existing vendor-supplied tools. Our software would need to include backup/restore/undo functionalities. Given that our hardware platform would be engineered to be fully redundant and to support automatic failover (two complete systems in a single 1U form factor), our appliance would need to come with built-in, preconfigured failover software support. In other words, our box would need to support all the fundamental components of a true appliance solution.

Users of single-function boxes, such as routers, have long known the major advantage of a complete, standalone appliance solution: the administrator can log on from virtually any machine or terminal to make configuration changes. There is no need to have special Windows, Solaris or other client software available/installed to make changes. Moreover, the administrator does not need to ensure hardware and software compatibility, install and configure the operating system, and then add and configure application software and management clients. With an appliance everything is completely self-contained. The administrator simply drops the new box into the network, logs in via ssh or a web browser to configure a few key settings, and the box is up and running.

Choosing the Operating System Platform

For Linux Journal readers, it goes without saying that Linux is the obvious choice for building appliances. It is worth mentioning that we also investigated Solaris and the Windows 2000 Server Appliance Kit as alternative platforms. Linux won because it was cost-effective, had a great community with good development support and had source code readily available.

Our V1 product does not include any changes to the kernel. Nevertheless, it was critical to know that as our customer base grows and our customers' requirements increase, we have the option of fine-tuning system performance and parameters through access to the source code. Moreover, there really is no better form of documentation than being able to look directly at the source code. And in the case of a bug or security hole, we are not dependent on any vendor for a patch or fix. In the worst-case scenario, we can make changes ourselves until a vendor supplied patch becomes available.

We also wanted to use a platform that was well tested and vendor supported; would be easily and positively recognized by our enterprise customers; was used by lots of other developers, so it would be easy to have questions answered; and, most importantly, was supported by the vendors whose software we would be using on the appliance--in this case, Check Point. So while using SuSE was intriguing, Checkpoint's default support for Red Hat made Red Hat the clear choice for our product.

Architecture: API/Application Model

From a usability perspective, we wanted administrators to have easy access to the functionality we were delivering. This meant the product needed to support multiple interfaces:

A menu-based wizard that would allow easy access to common setup and configuration functions. It would take users through common tasks in a directed, step-by-step fashion rather than requiring them to read tons of documentation to run command-line utilities. The wizard would need to be accessible from a remote terminal interface, since the appliance itself would have little or no display interface built-in. Lastly, the wizard would need to be administrable from the local machine or from a remote console, whether or not a web browser was available. The Wizard also was a good compromise between command-line or scripted functionality and a somewhat more cumbersome web-based interface (at least for those used to command lines).
For ease of use and for a more esthetically pleasing experience, a web-based interface that would provide access to similar functionality.
Command-line interfaces to the core functions for adding, deleting and displaying rules.

With these requirements in mind, a key aspect of the design of our architecture was to separate the user interface support from the base configuration functionality. From a development perspective, this modular design made it easy to separate our development efforts, allowing us to work independently on the configuration code and the user interface code. Moreover, it made it easy to wrap multiple user interfaces around the same set of base functions, for example, a command-line interface, a terminal interface and a web-based interface. Thus a core set of base programming interfaces was exposed, and the user interface developer was able write to those interfaces.

The second part of making the code modular was to separate core functions into separate, self-contained script files. This way, someone else could eventually use our command-line programs for their own, separate purposes, without having to write all the base functionality themselves.

An additional benefit of this modularization was ease of testing. That is, because each function is self-contained, the individual components can be unit tested, making it much easier to quickly find and fix bugs in the code. All of this is basic good programming sense, but it is especially easy to ignore or forget when you're in the mode of creating system administration scripts for clients, rather than programs for broader distribution as products.

Developing the Base Functionality

Nearly all of the base configuration functionality is provided through a set of complex file-parsing scripts. We initially explored the possibility of using the Check Point APIs in C. But the development of our first program using these APIs showed that while it could return information about the firewall policies, it was somewhat limited in its ability to make changes to the firewall rulesets. Although the file parsing we eventually had to write ourselves was intricate, it turned out to be the only way to get everything that we needed. And it also turned out to be the fastest way to retrieve information and make configuration changes.

We developed two general types of file parsing functions: those used to display configuration information to the administrator and those used to modify policies (e.g., add or delete rules). As described above, each of these functions was developed as a standalone script called by a set of command-line parameters.

The configuration files were not at all documented. So the method we used to determine how they worked was to make changes with the Check Point Policy Editor (a Windows application) and then determine what had changed in the files on the firewall machine. In some cases, dependencies between files must be taken into account when making the file changes directly. We also created a template file that was the basis for creating and adding new rules to the rules file.

While Check Point FW-1 uses a number of files to maintain configuration, policy and object information, the two key files we had to deal with were Standard.W, which contains the firewall rules, and objects_5_0.C, which contains the network objects, protocols and encryption information. (Note that these filenames are specific to Check Point FW-1 Next Generation or "NG", the most recent version of the Check Point software available.) Our scripts first modify the necessary information in these files and then run fw load Standard.W localhost, which causes Check Point FW-1 to generate compiled output from the source rule and object files.

Both Standard.W and objects_5_0.C use a syntax that consists of tabs, colons and parentheses, with one item on each line. For example, here is part of a rule contained in Standard.W:

        :rule (
                :AdminInfo (
                        :chkpf_uid
("{7034DEB7-3558-F694-B3CA-EAB180757E7E}¨)
                        :ClassName (security_rule)
                )
                :action (
                        : (accept
                                :AdminInfo (
                                        :chkpf_uid
("{29985208-1F06-3377-B732-F9414E49DBE1}¨)
                                        :ClassName (accept_action)
                                        :table (setup)
                                )
                                :action ()
                                :macro(RECORD_CONN)
                                :type (accept)
                        )
                )
        ?
        ?
                :track (
                        : None
                )
                :comments (samplerule)

A single rule is shown in the above example. AdminInfo is a common field associated with every rule: action indicates the type of action to be taken (e.g., accept, drop); track indicates whether to log the action taken; and comments indicates a comment associated with the rule. The firewall processes the rules in the order in which they are listed in the file.

The following example shows a snippet of the objects_5_0.c file:

        :network_objects (network_objects
                ?
                : (evrtwa1-test
                        ?
                :ClassName (gateway_ckp)
                :object_permissions (
                        ?
                        :owner ()
                        :read (
                                : (any)
                        )
                        :use (
                                : (any)
                        )
                        :write (
                                : ()
                        )
                )
                :table (network_objects)
        ?
        :protocols (protocols
                ?
                : (FTP
                        ?
                        :handler (ftp_code)
                        :match_by_seqack (true)
                        :res_type (ReferenceObject
                                ?
                        )
                        :type (tcp_protocol)
                )

The above example shows a network object, evrtwa1-test, which is an administrator-defined object (as opposed to a built-in object). Below the network object is an example of a protocol description contained in the objects file. In this case it is the FTP protocol, which is of class TCP (being able to recognize this, for example, enables the display of protocols sorted by class).

The consistent formatting of the files made them fairly easy to parse. The real difficulties came in understanding the contents of the files and the interdependencies between them and in getting the parsing exactly right (e.g., determining when the full contents of an object must be included, if and when quotes must be used, the differences between handling standard versus user-defined objects and so on). Check Point uses UIDs to identify each object uniquely and even to identify items within a particular rule.

To display policy information, we created three scripts: Print Rules, Print Objects and Print Services. The biggest design win was, again, a result of modularization--separating the input of policy information from its display. We did this by creating an in-memory map of the rules table using Perl's built-in hashing support. For those who have only ever programmed in C, this and regular expression handling alone make Perl a language you will fall in love with right away. This separation allowed for easy formatting of the rules once they had been read in.

The first example below shows how a rule is read in. Each rule is added to the rulelist array, and each rule has a number of keys (such as disabled, comments, src, dst, services) that are stored in a hash. Each key then can have associated with it one or more values, which are stored, in order, in an array. So the rulelist array contains hashes, and each key in each hash points to an array containing one or more values. This seems like a complicated arrangement, but in Perl it is actually quite simple:

        if (/^\t\t\t: \(?\¡¨?([^\(]+)\¡¨?\s/) {
                ..
                $temp = $1;
                ..
                @fields = @ { $hash->{$type} };
                push(@fields, $temp);
                $hash->{$type} = [ @fields ];
                ..              
        }
        ..
        push @rulelist, $hash;

Printing the rules once they are stored in this fashion is fairly straightforward (most of the pretty-print formatting code has been removed here):

        ..
        @printorder =
('disabled','src','dst','services','action','track','track','install','install','time','comments');
        ..
        for $href ( @rulelist ) {
                ..
                for $item ( @printorder ) {
                        ..
                        $val = shift (@ {href->{$item} });
                        ..
                        print substr($val, 0, $maxlen);
        
                }
        }

The Delete Rule function simply takes as input the number of the rule to delete, reads in the rules file and outputs it to a secondary file. Once the specified rule number is reached, output is suspended until the end of that specific rule, at which time it continues.

The Add Rule function is significantly more complex. It must read in a template rule file and then create the necessary elements for the specified rule, including creating new UIDs where necessary. It also must look up existing UIDs and object information in the object file and incorporate that into the new Standard.W file. Lastly, it must intelligently determine where to place the rule in the new file (e.g., before/after an existing rule or at the top or bottom of the ruleset). The following code demonstrates outputting the source and destination for a rule to the new Standard.W file:

        ? /:dst/ || /:src/) {
                print $tabs, $_;
                PrintHeader ($rule_table[$type], $elts[0]);
                foreach $elem (@elts) {
                        if ($elem=~/^!/) {
                                $elem = substr($elem, 1);
                        }
                        ?
                        if (GetType($elem) eq "dynamic_net_obj¨) {
                                print "$tabs¨, "$elem\n¨;
                        } else {
                                InsertObject($elem);
                        }
                ?
                }
        ?

Thus the scripts are able to take the line-by-line information contained in the policy and object files, parse it and display it in easy-to-read table format. Alternatively, the scripts can take user-input information and convert that into the format required by the Check Point configuration files, including bringing in any necessary external information.

Designing the User Interface (Wizard)

It seems to us that user interface design and construction is always harder than it appears. As soon as you get more than three or four basic options on the screen, you have to figure out how to organize them in a way that really works and makes sense to the user.

We have found that many Linux tools do not make use of a commonly available but often overlooked capability: color. The use of colors is important to help differentiate the information you are trying to convey to the user. In a terminal window, use of only a few ASCII color codes can mean the difference between lots of hard-to-read text and an easy-to-use menu system.

Rather than trying to use the ASCII color codes each time you want to change your output colors, it is easier to assign common color selections to variables in your scripts. This will save time and makes colors much easier to work with. Here is an example:

my $white = "\033[97m";
my $nocolor = "\033[0m\033[39m\033[49m";

To output a line with multiple colors, you would simply print out the corresponding variable to create a simple menu:

print "$white 1: $nocolor Menu Item 1\n";
print "$white 2: $nocolor Menu Item 2\n\n";     
print "Please choose a valid menu item from above
($white 1/2/C $nocolor ancle):";

The result appears as follows:

---------------------
1: Menu Item 1
2: Menu Item 2
Please choose a valid menu item from above
(1/2/Cancel): 
---------------------

In another part of our product, the user must choose from a list with over 200 options. To address this issue of multiple options in a terminal-based environment, we came up with the idea of matching on a part of the word to search for. Although we do allow the user to display the entire list, enabling a partial string search made it significantly easier to locate desired services. The following code demonstrates searching for text in an array within Perl to support this functionality.

The @printobjectoutput file contains the entire list of services available. For each element in the array, we strip out all unnecessary characters and display modifiers (such as tabs) and compare that to the search string entered by the user. We then display the list of matching services for the user to select and include in the rule they want to add. We identify each service with a number to make item selection easy.

print "What would you like to search for?: ";
chomp($match = <STDIN>);
print "\n";
                foreach $element (@printobjectoutput) {
                        $element =~ s/^\s+|\s+$//g;
                        if ($element =~ /$match/)   # returns true if
$match is in $element
                        {
                        print "$white $printobjectcounter";
                        print "$nocolor  $element\n";                               
                        }
                $printobjectcounter++;
                }
print "\nWhich service do you wish to apply to the
rule?: ";                       
chomp($pickedservice = <STDIN>);

Here is some sample output generated by the code above:

-------------------------------------------
What would you like to search for?: timestamp
 229    timestamp
 230    timestamp-reply
Which service do you wish to apply to the rule?: 230
-------------------------------------------

Backup/Restore/Undo

Our product supports three types of backup/restore/undo capabilities:

Full backup and restore via disk imaging
Manual/scheduled backup of critical system and firewall configuration files
An iterative undo capability

As you may recall, a unique aspect of our hardware platform is it includes two complete systems in a single box. We utilized PartitionImage because it allowed us to make an exact copy of the first machine in this two-machine box. We were then able to create an image file that could be heavily compressed and stored on the second machine. Even if there is a total failure of one of the machines, the user doesn't need to have a boot disk, CD and so on. Instead, they can simply boot up the other machine and get the whole system back up and running. We also support a Restore CD for a worst-case scenario, although this tends to be confusing to the end user. Both the hard disk-based restore and the bootable Restore CD make use of PartitionImage.

To use PartitionImage, we booted from a Linux boot disk that loads the necessary drivers and other utilities necessary to access the drive you want to make an image of. It also can load your network drivers if you want to store the image across the network, which was one of the benefits of the tool. You can then choose to make an image of the drive on any other partition of that machine or across a remote networked filesystem (e.g., to another machine or storage device). PartitionImage makes an exact copy of the disk, sector by sector; it doesn't simple copy the files. PartitionImage has floppy disk images available on their web site that make it easy to create the boot disk.

Using PartitionImage also was critical during the development phase of the product. In some cases, we would make so many configuration changes to the system, it would be unclear whether we had really gotten back to the original state. And there was no way to get back to the original setup and know for sure that it was a "clean" version.

The key was to create an image once we had installed the operating system to the state that we wanted, plus the application software, and performed the initial configuration. Otherwise, if we had had to reload the OS from scratch, install the application, then load any patches or updates, it would have been hours versus minutes. Your development time is also much better spent, because you can hit restore, go work on something else and come back when it is done.

Restoring is as easy as backing up. You can restore either from the local hard disk's other partition (the one you saved the partition to) or from the remote machine.

In summary, PartitionImage is easy to use. It's an open-source tool that is constantly being updated and that is well supported, and it's available free-of-charge. It was a complete and supported tool that met our needs and withstood our testing.

Configuration Backup and Restore

To create the backup and restore utilities, it was critical to determine which files needed to be backed up. For this purpose we used the utility FCheck, a popular and useful Perl script by Michael A. Gumienny. FCheck makes it is possible to take a snapshot of the files before changes are made, and then view the differences after the changes are completed. FCheck is available at www.geocities.com/fcheck2000/fcheck.html. (It is also extremely useful for performing intrusion detection.)

Setup and configuration is performed by modifying the fcheck.cfg file, in which you can specify both paths and individual files to be monitored for changes. You can exclude individual files or directories and specify whether a monitored directory should be recursively scanned.

Before making changes to the configuration files, we ran FCheck as follows:

./fcheck -acd

This created a baseline file, which stores all of the original states of the files, including file size and time of last modification. After modifying the configuration files and loading a new policy from the Windows-based Policy Editor, we ran FCheck as follows:

./fcheck -ad | grep WARNING

This displayed the files changed during the policy modification process.

Iterative Undo

Our original undo capability simply created copies of the files that would be changed and then copied them back if an undo was required. However, customer feedback showed that an iterative undo capability was highly desirable, due to the number of changes an administrator might make to a firewall configuration before finalizing it. The backup portion of the undo functionality is called from all other scripts that make modifications to the firewall rules, fox example, the scripts that perform rule addition or deletion.

The undo script works as follows:

Maintains a list of all the files to be backed up.
Backup files are stored as numbered files, e.g. the original file, Standard.W, is stored in the "undo" file, Standard.W.00.
Each time a backup is required, the script determines the highest numbered existing backup file.
If it has exceeded the allowed number of undos, it deletes the highest numbered undo files.
It copies each file to a next higher numbered file.
And finally, the actual copy of the original file is made.

The undo script performs similar actions, reducing the file number, rather than incrementing it.

Creating an undo capability is not exactly like the Undo you might use in a word processor, since changes to the firewall configuration may occur over a lengthy period of time. Thus, the administrator needs to be able to undo to a particular date and to determine the date of the next available undo.

Although the complete code sample is not shown, here are two useful takeaways for those writing their own scripts. First, Perl makes it easy to run an external program or script by using the ` character. For example:

$var = `./printdir`;

Second, for improved error handling, the script checks to make sure the directory that holds the undo files exists. Perl makes this easy through the use of the -d option for directories and -f option for files, for example:

if ( -d $dirname )

Iterative undo and restore is a feature that really should be included in every product. In our case, it makes a demo that is quite compelling for existing administrators. Every administrator knows in the back of their mind that they should backup configuration files after each individual change. But we have all gone through the process of making multiple changes without making a backup and then realizing that one of those changes has caused a problem--but we don't know which one! To be able to demonstrate that the administrator can then go back, step by step, through each individual change is extremely useful.

Ensuring Security

When making an appliance or bringing any networked machine on-line, you should turn on only the absolute minimum set of services required. As disk space today is not typically a cost issue, we recommend installing most if not all of the optional services you think you might need down the road. Services you do not need after setup should be disabled to narrow the chance of having security holes opened on your machine and network. To disable services, you can rename the symbolic links associated with the services in /etc/rc.d/rcX.d, where X is the system run level, or rename individual services in /etc/rc.d/init.d.

To ensure a secure system when the appliance is initially turned on, all remote services are blocked by default. An administrator must perform the initial configuration of the box from within the network or on the appliance directly. If desired, the administrator can change the rules to allow secure remote administration sessions. In either case, it is ultimately up to the administrator to determine which security settings are optimal for their desired implementation. But out of the box, the appliance defaults to the most secure configuration possible.

Getting the Software

With only two exceptions, all of the functionality described in the article is included in the downloadable version of the software available on our web site. The two exceptions are the failover mechanism, which requires the high-availability hardware configuration provided by the appliance, and the image-based restore capability, also included with the appliance. All the other capabilities, including the wizard- based policy editor, iterative undo and so on, are available in the software download.

Looking Ahead

Appliances used to be for tasks like load sharing and caching and were really targeted at internet and dot-com companies. But now appliances are appearing all over the place, with more and more enterprise-targeted uses. For example, appliances are being promoted as e-mail servers, web servers, corporate search engines and storage devices. It's simply easier to buy a box that already does it all, rather than choosing, purchasing and installing an operating system, installing and configuring application software and so on.

We believe appliances are an expanding market, serving the needs not only of internet companies, but of enterprises large and small. Ironically, this proliferation of standalone, single function appliances has resulted in a somewhat different and unexpected challenge: the complexity of managing the boxes themselves--both the hardware (power, connectivity, space) and the software (multiple diverse management consoles required to perform administration).

Jed Stafford is a developer of appliance software and hardware products at EdgeFinity, Inc.

email: jed@edgefinity.com

Load Disqus comments