Network Buffers and Memory Management

Writing a network device driver for Linux is fundamentally simple—most of the complexity (other than talking to the hardware) involves managing network packets in memory.
Protocol Layer Variables

In order for the network protocol la

yers to perform in a sensible manner, the device has to provide a set of capability flags and variables that are also maintained in the device structure.

The mtu is the largest payload that can be sent over this interface, i.e., the largest packet size not including any bottom layer headers that the device itself will provide. This number is used by the protocol layers such as IP to select suitable packet sizes to send. There are minimums imposed by each protocol. A device is not usable for IPX without a 576 byte frame size or higher. IP needs at least 72 bytes and does not perform sensibly below about 200 bytes. It is up to the protocol layers to decide whether to co-operate with your device.

The family is always set to AF_INET and indicates the protocol family the device is using. Linux allows a device to be using multiple protocol families at once, and maintains this information solely to look more like the standard BSD networking API.

The interface hardware type field is taken from a table of physical media types. The values used by the ARP protocol (see RFC1700) are used by those media that support ARP, and additional values are assigned for other physical layers. New values are added whenever necessary both to the kernel and to net-tools, the package containing programs like ifconfig that need to be able to decode this field. The fields defined as of Linux pre2.0.5 are:

From RFC1700:
ARPHRD_ETHER         10 and 100Mbit/second Ethernet
ARPHRD_EETHER   Experimental Ethernet (not used)
ARPHRD_AX25          AX.25 level 2 interfaces
ARPHRD_PRONET        PROnet token ring (not used)
ARPHRD_CHAOS         ChaosNET (not used)
ARPHRD_IEE802        802.2 networks notably token ring
ARPHRD_ARCNET        ARCnet interfaces
ARPHRD_DLCI          Frame Relay DLCI
Defined by Linux:
ARPHRD_SLIP          Serial Line IP protocol
ARPHRD_CSLIP         SLIP with VJ header compression
ARPHRD_SLIP6         6bit encoded SLIP
ARPHRD_CSLIP6        6bit encoded header compressed SLIP
ARPHRD_ADAPT         SLIP interface in adaptive mode
ARPHRD_PPP      PPP interfaces (async and sync)
ARPHRD_TUNNEL6  IPv6 over IP tunnels
ARPHRD_FRAD          Frame Relay Access Device
ARPHRD_SKIP          SKIP encryption tunnel
ARPHRD_LOOPBACK Loopback device
ARPHRD_LOCALTLK Localtalk apple networking device
ARPHRD_METRICOM Metricom Radio Network

Those interfaces marked unused are defined types but without any current support on the existing net-tools. The Linux kernel provides additional generic support routines for devices using Ethernet and token ring.

The pa_addr field is used to hold the IP address when the interface is up. Interfaces should start down with this variable clear. pa_brdaddr is used to hold the configured broadcast address, pa_dstaddr is the target of a point to point link, and pa_mask is the IP netmask of the interface. All of these can be initialized to zero. The pa_alen field holds the length of an address (in our case an IP address), and should be initialized to 4.

Link Layer Variables

The hard_header_len is the number of bytes the device needs at the start of a network buffer passed to it. This value does not have to equal the number of bytes of physical header that will be added, although this number is usually used. A device can use this value to provide itself with a scratch pad at the start of each buffer.

In the 1.2.x series kernels, the skb->data pointer will point to the buffer start, and you must avoid sending your scratch pad. This also means that for devices with variable length headers you need to allocate max_size+1 bytes and keep a length byte at the start so that you know where the header actually begins (the header should be contiguous with the data). Linux 1.3.x makes life much simpler. It ensures that you have at least as much room as you requested, free at the start of the buffer. It is up to you to use skb_push() appropriately, as we discussed in the section on networking buffers.

The physical media addresses (if any) are maintained in dev_addr and broadcast respectively and are byte arrays. Addresses smaller than the size of the array are stored starting from the left. The addr_len field is used to hold the length of a hardware address. With many media there is no hardware address, and in this case, this field should be set to zero. For some other interfaces, the address must be set by a user program. The ifconfig tool permits the setting of an interface hardware address. In this case it need not be set initially, but the open code should take care not to allow a device to start transmitting before an address has been set.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

What about rmem_max / rmem_default ?

Anonymous's picture

An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?

Thanks for anyone who would make it clear to me,

Missing pictures

Ovy's picture

The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).

Thank you,


Mitch Frazier's picture

Should be working now.

Mitch Frazier is an Associate Editor for Linux Journal.


Ravikumar's picture

thanx for the great article..

at each layer the data and tail pointers change right??

so if i need to acces the L7 data,consider UDP can i take the from pre routing hook can i take data+udphdr->length..??

Help Required....

Ram's picture

Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.

i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.

Thanx in adavance,

good article

Ajay Thakur's picture

thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.