Linux and the Next Generation Internet
Our Diffserv implementation is relatively simple, but highlights many of the strengths and flexibilities of Linux, which we believe are especially important in the era of Internet-centric computing. To make this a more complete description of the several technologies related to “differentiated services”, we include some background material on the purpose and structure of Diffserv as well as highlights of other, more complex implementations of Diffserv environments which have benefitted from the sophistication of the Linux kernel.
Although our implementation of a Diffserv environment is fairly straightforward, we feel it's also interesting for these reasons:
We use an approach that allows reconfiguration of the entire network instantaneously and on-demand from a “network management workstation”. This was a key component of our environment, because one of our main goals was to demonstrate concisely, for a non-technical audience, using real-time applications, the “before-after” effects of a Diffserv-enabled network.
We developed this environment supported, in part, by one of the major Regional Bell Operating Companies (RBOCs) and by the National Science Foundation through a “Research Experiences for Undergraduates” (REU) supplement to an existing NSF grant. As such, the system configurations, software and most of the architecture was developed primarily by undergraduate engineering students at our university.
Our demonstration environment has been used for live, hands-on demonstrations at two large regional meetings: the Southeastern Universities Research Association (SURA) Applications Workshop (Sept. 1999) and the Bellsouth Science & Technology Innovations Showcase (Oct. 1999).
The key to “service differentiation” (or Quality of Service, QoS) in the Internet is the way routers handle (or can be easily modified to handle) multiple classes of traffic with various requirements for transport. The term “Diffserv” refers to an approach for implementing such capabilities which is being defined (through the usual method of constructing Internet standards) to be broadly compatible with the scope and flavor of the global Internet (see Resources 1).
The architecture of Diffserv can be viewed in terms of relatively simple functional units in the Internet's “forwarding nodes” (routers) (see Resources 2). The simplicity of Diffserv is important because, in theory, it has the potential to provide coarse differentiation between types of Internet traffic without requiring a fundamental change to the current configuration of the Internet.
One of the functional units described by Diffserv is a set of “per-hop behaviors” (PHBs). The idea behind PHBs is to let each router easily and quickly classify packets into different types of output queues based on a “tag” embedded in the packet header. Square tags go in “square” queues. Round tags go in “round” queues. Packets in the “square” queue get treated differently than packets in the “round” queue.
The scheme works in much the same way airline passengers are allowed to check bags or board the plane: “first class” goes first, “coach class” is next, and “standby” is last if there's enough room. There are also other functional units in Diffserv which are often called packet classification and traffic conditioning. In keeping with the airline analogy, packet classification is akin to the act of purchasing a type of ticket (or having one assigned to you, based on some rules) and traffic conditioning is like the disturbances (e.g., bumping and rerouting) experienced by a group of passengers when a flight is canceled or delayed. For our demonstration environment, we focus primarily on the PHBs and differences between particular classifications of traffic when the network is congested. To put it another way, we essentially ask the following questions: “Is first class really better than coach?” and “How can I tell?”
In Diffserv, the “first class” designation is called expedited forwarding (EF) (see Resources 3). The idea of EF is to simulate a “virtual leased line” by ensuring minimal queuing of packets within each router along the transport path. As such, the EF class hopes to provide guarantees on delay and jitter, which are important for isochronous data streams (i.e., video and audio). This is one of Diffserv's weak points, in our opinion. Due to an explicitly designed inability to distinguish between individual traffic streams, the aggregate EF flow receives the desired treatment. There can be no “hard promises” made to individual flows unless there are very few EF flows. This result has been noted with some chagrin in many publications (see Resources 4). The effect of EF classification is the presence of high amounts of jitter between subsequent packets in individual streams. As a result of the stated goals and the architecture of Diffserv, the only way to minimize these effects is to practice “gross overprovisioning”, where only a small percentage of the available bandwidth is made available to the EF class, and only a few EF streams are allowed. In the airline analogy, the number of first-class passengers on any flight would have to be limited strictly to a tiny proportion of the available seats. Otherwise, the flight attendants wouldn't be able to guarantee good service.
The “coach class” designation in Diffserv is called assured forwarding (AF) (see Resources 5) and is a bit more complicated than EF. The complication of AF is primarily due to the fact that there are four different classes of AF, and each class has three subtypes. The difference between AF classes is related to different levels of “forwarding assurances”. The difference between subtypes in each AF class is related to different levels of “drop precedence” or relative importance within the class (i.e., low, medium, high).
The relationship between “class” and “drop precedence” is subtle. Each class is allocated resources (such as buffer space, bandwidth and so on) at each forwarding node (router). These resources comprise a level of “assurance” that packets from each class will be forwarded as desired. Transmissions can exceed these resources at their own peril, described by the “drop precedence”. So, within the AF designation, forwarding depends on the relationships between the instantaneous traffic load at a router, the “available” resources compared to the “desired” resources and the drop precedence of each packet.
The “standby class” designation in Diffserv is the well-known best effort (BE) behavior of the current Internet. So, coarse differentiation between service levels is made by classifying packets as BE (poor), AF (better with conditions) or EF (best).
As a result of the functional unit architecture of Diffserv, and in an effort to push per-stream complexity to the edge of the network, there are actually at least two different types of routing/forwarding nodes in a Diffserv domain. According to the Diffserv specification, “edge” routers use a (possibly complex) set of rules to insert tags into the header of each IP packet. These tags are called “Diffserv Code Points” or DSCP (see Resources 6). Once the packets have been tagged and admitted into the interior of the Diffserv domain, “core” routers simply have to examine each packet's DSCP and assign it to the corresponding output queue to be forwarded on to the next node. With proper network architecture, each packet should be able to consume the forwarding resources it needs and is entitled to as a result of its “tag”.
The ability to implement advanced routing behavior using Linux, including those proposed by Diffserv, is provided by the rich set of traffic-control features present in the Linux kernel. Alexey Kuznetsov is the author of these kernel features and the user-space programs used to control them. The architecture of the Linux traffic control features is described nicely by Almesberger (see Resources 7), and the motivation and control of these features is also summarized in an excellent LJ article by Hadi-Salim (see Resources 8). For clarity, we include a brief review of the Linux traffic-control capabilities used in our implementation and our approach to configuring them. In general, to enable “differentiated services” for Linux, first the Linux box has to be able to route IP packets correctly, and several rules for traffic control must then be put in place.
In preparation for use as a Diffserv router, the kernel of the Linux router must be configured to allow the use of advanced routing features. To implement Diffserv-type behaviors effectively, several “subsystems” of the kernel must be available. These subsystems include the routing capabilities of the kernel, the packet scheduling functions, and the netlink functionality to configure the traffic-control modules. The traffic-control functions can be compiled into a monolithic kernel or loaded as modules.
A summary of the pertinent features compiled into our Diffserv routers is shown in Listings 1 and 2. All locations given are representative of the option list given during make menuconfig. You may be checking your kernel configuration menu now, and saying to yourself, “Hmm... I don't see those choices!” That's because you haven't acquired the necessary kernel patch. The web site for “Differentiated Services on Linux” is maintained by Werner Almesberger at the Swiss Federal Institute of Technology (see Resources 9). Here you will find the “Diffserv for Linux” distribution (as of this writing, the current version was ds-6). The distribution comes with a set of patches for both the kernel and for a user-space application to configure traffic-control kernel features (called “tc”). Also included in the distribution is a set of example scripts and some documentation. It is a good idea to acquire a copy of the package iproute2+tc at this point (see Resources 10). The patch from the Linux Diffserv distribution is version-sensitive with iproute2+tc, and since our project took place mainly in the summer of 1999, we used version ss990630 of iproute2+tc.
Once your Linux router has been configured properly (depending on your router's job), you are ready to configure your machine for traffic control.
To enable differentiated services on a Linux router, the traffic-control features must be configured. This configuration is achieved through a user-level program, appropriately named tc (traffic control). The command-line syntax for tc is quite long and complex, so scripts are generally used for configuration. An example tc configuration script is shown in Listing 3. In the listing, tc is being used to configure kernel traffic control for a core router in our Diffserv application. This entails attaching a parent queuing discipline to the applicable interface, then creating the queues for the varying classes of traffic. Finally, filters are created to classify packets into the appropriate classes.
As can be seen in Listing 3, the structure of the tc configuration scripts for a Diffserv-enabled Linux router can be broken down into parts:
Creation of the root queuing discipline. This uses the syntax tc qdisc add followed by several parameters. These parameters describe attributes of this queuing discipline. These parameters include which network interface the queuing discipline is attached to (dev eth3), an identifier for qdisc (handle 1:0), where in the qdisc hierachy to insert this qdisc (root) and which queuing discipline to use (tcindex). The remaining parameters are specific to the particular queuing discipline. Diffserv maps naturally into a class-based queuing scheme. Therefore, each Diffserv router (regardless of job) will employ class-based queuing (CONFIG_NET_SCH_CBQ) to house its various per-hop behaviors.
Creation of classes for each type of per-hop behavior. This uses the syntax tc class add followed by several parameters. These parameters are similar to the tc qdisc add syntax. These parameters will identify which queuing discipline the class belongs to, and other parameters define the behavior of the class. Our demonstration made extensive use of two per-hop behaviors: best effort (BE) and expedited forwarding (EF). The configuration in Listing 3 clearly shows the two sections defining BE and EF PHBs.
Creation of queuing disciplines for each class. Each class must have a queuing discipline to determine how packets are enqueued and dequeued. The syntax for this step is identical to that for step 1. The EF PHB class uses a simple FIFO (first-in, first-out) for its queuing discipline, since we wanted the traffic to get in and out of the class as quickly as possible. The BE PHB class uses a token bucket filter in an attempt to throttle the traffic-generation machines during times of extreme congestion.
Creation of filters (classifiers) to assign marked traffic to the appropriate class. This uses the syntax tc filter add followed by several parameters used to describe which packets are bound for what classes. Our sample script is from a core router. Packets arriving at this interface have already been marked by edge routers. Classifying packets at this step requires matching the TOS (type of service) bits from the IP header to values suggested by the IETF (Internet Engineering Task Force) Differentiated Services workgroup for various per-hop behaviors (denoted by the value following the “mask” parameter). The filter creation varies, based on which job the router fulfills. Core routers solely use the tcindex packet classifier (CONFIG_NET_CLS_TCINDEX) included with the Diffserv distributions. Edge routers use the firewall packet classifier (CONFIG_NET_CLS_FW) along with ipchains.
Complete Diffserv functionality really assumes two different types of routing capabilities: “core” and “edge” routers. With a Linux-based Diffserv implementation, “edge” routers use ipchains to handle their tasks. Replacing the application ipfwadm from earlier kernels, ipchains is a user-space program that configures the firewalling functionalities of Linux kernels 2.1.x and higher. Configuring ipchains has been well-documented in this magazine (see Resources 11) and other arenas, and is beyond the scope of this document. Our Linux Diffserv testbed uses ipchains to assign handles to incoming traffic based on IP address rules. These handles are then used by a filter (classifier) installed with tc (the user-space application) to replace the current IP TOS bytefield setting with the appropriate Diffserv field marking (DSCP). This method proved to be very effective. Dynamic configuration was easily attainable, and the speed of ipchains held up to very high demand. Even though ipchains will be superceded by iptables in future versions of the Linux kernel (see Resources 12), the functionality will be very similar. So, the approach we've used will still be applicable.
The specific scripts used to provide Diffserv capability in our testbed environment are available at ftp://cter.eng.uab.edu/Diffserv/.
A goal of our demonstration environment, in addition to concisely demonstrating the effect of differentiated services, was to prove that the queuing mechanisms within the Linux Diffserv implementation were robust enough to enforce various SLAs throughout our Diffserv domain. As shown in Figure 1, the domain was composed of three routers (one core router, two leaf routers), two Litton CAMVision-2 MPEG-2 codecs (up to 15Mbps) or two Vbrick MPEG-1 codecs (up to 3Mbps), two client workstations, one web server and one network management workstation (NMS).
In the figure, the classification of traffic is performed by the leaf routers “obiwan” and “nimitz”, and the core router “quigon” is configured for the corresponding DSCP-based forwarding and queueing. The traffic streams are color-coded to correspond to particular types of PHBs (blue=BE, red=EF and so on). Notice from the figure that the link between quigon and nimitz is 10 MBps Ethernet and is consistently oversubscribed with multiservice traffic. This is the situation where differentiation between SLAs is critical. To make sure the instantaneous change between SLAs was clearly visible to the casual observer, we used the MPEG video stream as well as some interactive, web-based streaming media (RealAudio, RealVideo, etc.).
As shown in Table 1 and Figure 1, we were able to configure several service levels with our approach, each of which was available via a single mouse click. Note that the values and configurations shown in Table 1 and Figure 1 reflect a particular set of SLAs which used only BE and EF traffic classes. When the user clicks on the desired SLA icon, the value from the HTML form field is passed to the web server via an HTTP POST operation. The form values are passed via CGI to a Perl script that processes the POST, then reconfigures each router in the domain. The routers are contacted one by one, and the SLA chosen by the administrator is invoked. Sample Perl pseudocode for the client portion of router control is shown in Listing 4, and the server portion is shown in Listing 5. As can be seen from the Perl client code in Listing 4, the NMS (or other web server) can easily pass the “current SLA” to all routers in the domain based on input from the network manager. This “control channel” interface was protected in all network configurations by a high-priority, low-rate queuing configuration, shown as the black line in Figure 1.
To provide positive user feedback at the NMS, the web interface is refreshed for the administrator while each router begins its unique network setup. Each Diffserv-enabled router in the domain receives the desired SLA and must set up its rules accordingly, depending on its position within the domain and the collection of statically defined SLAs. This is done dynamically via a system call to ipchains-restore according to the new SLA. When the ipchains-restore command finishes, the network setup is complete. The Perl pseudocode for this operation is shown in Listing 5 for a typical core router. As our system is defined, we maintain essentially a simple “database” of network/SLA configurations in pre-stored ipchains mappings.
To attempt to simulate some typical end-user traffic in addition to the constant MPEG stream, we used a number of FTP downloads, some streaming audio/video sources and a small flood ping throughout the network. Due to the interactive nature of our demonstration environment, these network-based data sources were also available “on demand” from a web-based GUI.
It would be inappropriate, particularly with respect to any open-source developments, to neglect mentioning related efforts, or efforts which have contributed to the system described in this article. Two Linux-based Diffserv projects we feel are especially interesting and mature are the efforts underway at the University of Karlsruhe (see Resources 13) and the University of Kansas (see Resources 14). Many of the conclusions and insights made available through these projects correspond with our own observations, and they are excellent sources of further information on Diffserv and differentiated services under Linux. We highly recommend them to the interested reader.
In particular, we want to call attention to the differences between the demonstration environment described in this article and the DiffSpec tool under construction at the University of Kansas. The Diffserv approach to resource allocation for each class of service very explicitly requires external intervention in the form of what has been called a “bandwidth broker” (BB). The DiffSpec tool entails a much grander system concept than the demonstration environment discussed here. For example, DiffSpec includes an API for managing queue/class/filter combinations, CORBA-based system calls for automated configuration of DS parameters, and a general web-based user interface to the Linux traffic-control capabilities.
In contrast, for the purpose of our demonstration environment, we unwittingly followed a “separation of powers” philosophy well known to students of political science. We carefully segregated the “service level definition” (or “legislative branch”) functions of the BB into a manually crafted, static database of allowable configurations. At the same time, we placed the “network instantiation” (or “executive branch”) functions of the BB onto a cleverly distributed arrangement of ipchains rules. In this fashion, we are able to reconfigure the entire network instantly to one of several predefined “looks” through either an operator's input or by an automated means. This approach may be scalable in some contexts, and it may provide for convenient “governance” of network resources, but it was not specifically intended for mass consumption.
Additionally, the structure of the Karlsruhe Diffserv implementation seems to be somewhat different than the implementation maintained by Werner Almesberger at the Swiss Federal Institute of Technology in Lausanne. For our project, we used Almesberger's distribution, so we don't have specific experience with the Karlsruhe distribution or the differences between the two implementations.
In reviewing the architecture and explicit results provided by the K.I.D.S. project (see Resources 4), we agree with their conclusions regarding strengths and weaknesses of Diffserv. In particular, through the use of our “before-after” scenario for configuring a Diffserv domain, we have experienced first-hand corroboration of these factors in the context of our “real applications”.
We also agree with Metz, who states (see Resources 1) that “In the long run it will most likely be a combination (of technologies) that will enable the Internet to offer QoS.” When the long run materializes, we're confident that Linux will be a part of the solution, because QoS in the Internet is definitely “where you want to go tomorrow”.
Michael Stricklen (firstname.lastname@example.org) is a research assistant at the UAB Center for Telecommunications Education and Research. He enjoys tinkering with Linux projects and networking gear. When not in front of a computer, Michael would prefer to be riding a snowboard.
Bob Cummings (email@example.com) is a research assistant at the UAB Center for Telecommunications Education and Research. In his spare time, he likes to spend time with friends, play RPGs and hone his Perl skills.
Stan McClellan (firstname.lastname@example.org) is a Linux enthusiast who happens to be employed in the UAB Dept. of Electrical and Computer Engineering. When he isn't “playing Professor” by hustling research money or teaching classes, he can be found wandering around, pestering students for “interesting results”