Network Buffers and Memory Management

Writing a network device driver for Linux is fundamentally simple—most of the complexity (other than talking to the hardware) involves managing network packets in memory.
Registering a Device

Each device is created by filling in a struct device object and passing it to the register_netdev(struct device *) call. This links your device structure into the kernel network device tables. As the structure you pass in is used by the kernel, you must not free this until you have unloaded the device with void unregister_netdev(struct device *) calls. These calls are normally done at boot time or at module load/unload.

The kernel will not object if you create multiple devices with the same name, it will break. Therefore, if your driver is a loadable module you should use the struct device *dev_get(const char *name) call to ensure the name is not already in use. If it is in use, you should pick another name or your new driver will fail. If you discover a clash, you must not use unregister_netdev() to unregister the other device using the name!

A typical code sequence for registration is:

int register_my_device(void)
{
  int i=0;
  for(i=0;i<100;i++)
  {
    sprintf(mydevice.name,"mydev%d",i);
    if(dev_get(mydevice.name)==NULL)
    {
      if(register_netdev(&mydevice)!=0)
        return -EIO;
      return 0;
    }
  }
  printk(
"100 mydevs loaded. Unable to load more.\n");
  return -ENFILE;
}
The Device Structure

All the generic information and methods for each network device are kept in the device structure. To create a device you need to supply the structure with most of the data discussed below. This section covers how a device should be set up.

Naming

First, the name field holds a string pointer to a device name in the formats discussed previously. The name field can also be " " (four spaces), in which case the kernel automatically assigns an ethn name to it. This special feature should not be used. After Linux 2.0, we intend to add a simple support function of the form dev_make_name("eth") for this purpose.

Bus Interface Parameters

The next block of parameters is used to maintain the location of a device within the device address spaces of the architecture. The irq field holds the interrupt (IRQ) the device is using, and is normally set at boot time or by the initialization function. If an interrupt is not used, not currently known or not assigned, the value zero should be used. The interrupt can be set in a variety of fashions. The auto-irq facilities of the kernel can be used to probe for the device interrupt, or the interrupt can be set when loading the network module. Network drivers normally use a global int called irq for this so that users can load the module with insmod mydevice irq=5 style commands. Finally, the IRQ field can be set dynamically using the ifconfig command, which causes a call to your device that will be discussed later on.

The base_addr field is the base I/O space address where the device resides. If the device uses no I/O locations or is running on a system without an I/O space concept, this field should be set to zero. When this address is user settable, it is normally set by a global variable called io. The interface I/O address can also be set with ifconfig.

Two hardware-shared memory ranges are defined for things like ISA bus shared memory Ethernet cards. For current purposes, the rmem_start and rmem_end fields are obsolete and should be loaded with 0. The mem_start and mem_end addresses should be loaded with the start and end of the shared memory block used by this device. If no shared memory block is used, then the value 0 should be stored. Those devices that allow the user to to set the memory base use a global variable called mem, and then set the mem_end address appropriately themselves.

The dma variable holds the DMA channel in use by the device. Linux allows DMA (like interrupts) to be automatically probed. If no DMA channel is used or the DMA channel is not yet set, the value 0 is used. This option may have to change, since the latest PC boards allow ISA bus DMA channel 0 to be used by hardware boards and do not just tie it to memory refresh. If the user can set the DMA channel, the global variable dma is used.

It is important to realise that the physical information is provided for control and user viewing (as well as the driver's internal functions), and does not register these areas to prevent them being reused. Thus, the device driver must also allocate and register the I/O, DMA and interrupt lines it wishes to use, using the same kernel functions as any other device driver. [See the recent Kernel Korner articles on writing a character device driver in issues 23, 24, 25, 26 and 28, or visit the new Linux Kernel Hackers' Guide at www.redhat.com:8080/HyperNews/get/khg.html, for more information on the necessary functions—ED]

The if_port field holds the physical media type for multi-media devices such as combo Ethernet boards.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

What about rmem_max / rmem_default ?

Anonymous's picture

An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?

Thanks for anyone who would make it clear to me,
Telenn

Missing pictures

Ovy's picture

The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).

Thank you,
Ovy

Fixed

Mitch Frazier's picture

Should be working now.

Mitch Frazier is an Associate Editor for Linux Journal.

thnx

Ravikumar's picture

thanx for the great article..

at each layer the data and tail pointers change right??

so if i need to acces the L7 data,consider UDP can i take the from pre routing hook can i take data+udphdr->length..??

Help Required....

Ram's picture

Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.

i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.

Thanx in adavance,

good article

Ajay Thakur's picture

thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.

Ajay

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix