IP Bandwidth Management

A look at the new traffic control code in the kernel and how it aids in bandwidth management.

The following is an example of a Linux box with two virtual servers (web, FTP, etc.) and how an ISP can sell them as two separate packages based on the maximum bandwidth one gets when visiting those two virtual servers.

Kernel Compile

I'll assume you know how to compile the kernel and add network and aliasing support. I used kernel 2.1.129 and a few other patches at the time of this writing. Linux 2.2 pre1 had just come out, but the patches had not made it in yet. By the time you read this, 2.2 will be out and everything I used will be included.

The first challenge is the clock source. In order to accurately determine bandwidth measurements, you need a very fine-grained clock. In Linux, the clock runs at a frequency of HZ, defined to be 100 for the ix86, i.e., 100 clock ticks per second which translates to a granularity of 10ms for each clock tick. On Alphas, HZ is defined as 1000, giving a granularity of 1ms. I would not suggest changing the value of HZ in the code. The TC clock sources are adjusted by editing the file /include/net/pkt_sched.h under the kernel tree and modifying the line which defines PSCHED_CLOCK_SOURCE. To start, I suggest leaving the clock source alone until you get comfortable with running other things. The default clock source, PSCHED_JIFFIES, will work fine on all architectures. Use PSCHED_CPU on high-end Pentiums and Alphas. The most precise and expensive clock source is PSCHED_GETTIMEOFDAY. Use this if you have a truly high-end Pentium II or Alpha. Do not try to use it on a 486.

Next, compile the kernel. Select Kernel/User netlink socket and Netlink device emulation to allow use of netlink so that tc can talk to the kernel. The second option is a backward compatibility option and may be obsolete now that 2.2 is out, so don't worry if you don't see it. Next, compile in all the queueing disciplines and classifiers. Although each can be selected as a module, I compiled them straight in. The selections are QoS or fair queueing, CBQ packet scheduler, CSZ packet scheduler, the simplest PRIO pseudoscheduler, RED queue, SFQ queue, TBF queue, QoS support, rate estimator, packet classifier API, routing-tables-based classifier, U32 classifier, special RSVP classifier and special RSVP classifier for IPv6.

Go through the normal procedure of compiling and installing the kernel.

Compiling and Setting Up TC

If you use glibc as I do (Red Hat 5.2), you will need to apply the glibc patch. A glibc patched source for tc is included (tc-glibc-patched.tgz). The major catch is to change the Makefile to point to where the kernel include file is. Typing make should then cleanly compile tc and ip for you. The ip-routing directory contains patches with names iproute2-*.glibc2.patch.gz. Get the latest one to match with the current tc. I downloaded iproute2-2.1.99-now-ss981101.glibc.patch.gz at the time of this writing.

tc Setup

Figure 2. CBQ Tree Diagram

Figure 2 shows the simple scenario we are going to set up. Two leaf nodes are emanating from the root. IP address (classid 1:1) and (classid 1:2) are aliases on device eth0. They all share the same parent—classid 1:0 (the root node). Again, the intent is to show what one can do without going into fine details or building a complex TC setup. With some modifications, one can build more interesting setups with multiple devices.

The general recipe for setting up the QoS features is to first attach a qdisc to a device. In the sample script, this is achieved by the line

qdisc add dev eth0 root handle 1: ...

Next, define your classes. This allows you to discriminate between the different traffic types going out. In the sample script, this is achieved by the lines which start with

tc class add dev eth0 parent 1:0 classid X:Y ...
In the sample script, a one-level tree is shown. However, one can build a multiple depth tree. Basically, a child node (as shown in Figure 2) inherits from a parent and is then further resource-restricted by the class definition. For example, the root class 1:0 owns the device's bandwidth. The child node 1:1 cannot have more than 10Mbits allocated to it, but is restricted to 1Mbps. Eventually, the leaf nodes get packets sent to them based on the classifier mapping packets to them. This is quite similar to the UNIX directory and file tree structures. You can think of non-leaf nodes as directories and leaf nodes as files.

Finally, define your packet-to-class mappings to tell your classifier which packets to send to which class. You must define the classes for this to make sense. First, attach a classifier to the proper class. In the sample script, this is achieved by the construct which starts with the line

filter add dev eth0 parent 1:0 protocol ip ...

Next, define the packet-to-class mappings that will be used. In the sample script, this is defined in the constructs that define the matching criteria (such as match ip src ...). Always map packets to leaf classes.

If you follow this recipe and substitute the right syntax for the different queueing disciplines and filters, you will get it right. The appropriate details are in the options.

Listing 1.