Network Buffers and Memory Management
A set of flags is used to maintain the interface properties. Some of these are “compatibility” items and as such are not directly useful. The flags are:
IFF_UP The interface is currently active. In Linux, the IFF_RUNNING and IFF_UP flags are basically handled as a pair, existing as two items for compatibility reasons. When an interface is not marked as IFF_UP, it can be removed. Unlike BSD, an interface that does not have IFF_UP set will never receive packets.
IFF_BROADCAST The interface has broadcast capability. There will be a valid IP address for the interface stored in the device addresses.
IFF_DEBUG Indicates debugging is desired. Not currently used.
IFF_LOOPBACK The loopback interface (lo) is the only interface that has this flag set. Setting it on other interfaces is neither defined nor a very good idea.
IFF_POINTOPOINT This interface is a point to point link (such as SLIP or PPP). There is no broadcast capability as such. The remote point to point address in the device structure is valid. Normally, a point to point link has no netmask or broadcast, but it can be enabled if needed.
IFF_NOTRAILERS More of a prehistoric than an historic compatibility flag. Not used.
IFF_RUNNING See IFF_UP
IFF_NOARP The interface does not perform ARP queries. Such an interface must have either a static table of address conversions or no need to perform mappings. The NetROM interface is a good example of this. Here all entries are hand configured as the NetROM protocol cannot do ARP queries.
IFF_PROMISC If it is possible, the interface will hear all of the packets on the network. This flag is typically used for network monitoring, although it can also be used for bridging. One or two interfaces like the AX.25 interfaces are always in promiscuous mode.
IFF_ALLMULTI Receive all multicast packets. An interface, that cannot perform this operation but can receive all packets, will go into promiscuous mode when asked to perform this task.
IFF_MULTICAST Indicates that the interface supports multicast IP traffic, which is not the same as supporting a physical multicast. AX.25 for example supports IP multicast using physical broadcast. Point to point protocols such as SLIP generally support IP multicast.
Packets are queued for an interface by the kernel protocol code. Within each device, buffs[] is an array of packet queues for each kernel priority level. These are maintained entirely by the kernel code, but must be initialized by the device itself on boot up. The intialization code used is:
int ct=0;
while(ct<DEV_NUMBUFFS)
{
skb_queue_head_init(&dev->buffs[ct]);
ct++;
}
All other fields should be initialized to 0.
The device gets to select the queue length it needs by setting the field dev->tx_queue_len to the maximum number of frames the kernel should queue for the device. Typically this is around 100 for Ethernet and 10 for serial lines. A device can modify this dynamically, although its effect will lag the change slightly.
Each network device has to provide a set of actual functions (methods) for the basic low level operations. It should also provide a set of support functions that interface the protocol layer to the protocol requirements of the link layer it is providing.
The init method is called when the device is initialized and registered with the system, in order to perform any low level verification and checking needed. It returns an error code if the device is not present, if areas cannot be registered or if it is otherwise unable to proceed. If the init method returns an error, the register_netdev() call returns the error code, and the device is not created.
All devices must provide a transmit function. It is possible for a device to exist that cannot transmit. In this case, the device needs a transmit function that simply frees the buffer passed to it. The dummy device has exactly this functionality on transmit.
The dev->hard_start_xmit() function is called to provide the driver with its own device pointer and network buffer (a sk_buff) for transmitting. If your device is unable to accept the buffer, it should return 1 and set dev->tbusy to a non-zero value. This action will queue the buffer to be retried again later, although there is no guarantee that a retry will occur. If the protocol layer decides to free the buffer that the driver has rejected, then the buffer will not be offered back to the device. If the device knows the buffer cannot be transmitted in the near future, for example due to bad congestion, it can call dev_kfree_skb() to dump the buffer and return 0 indicating the buffer has been processed.
If there is space the buffer should be processed. The buffer handed down already contains all the headers, including link layer headers, necessary and need only be loaded into the hardware for transmission. In addition, the buffer is locked, which means that the device driver has absolute ownership of the buffer until it chooses to relinquish it. The contents of a sk_buff remain read-only, with the exception that you are guaranteed that the next/previous pointers are free, so that you can use the sk_buff list primitives to build internal chains of buffers.
When the buffer has been loaded into the hardware or, in the case of some DMA driven devices, when the hardware has indicated transmission is complete, the driver must release the buffer by calling dev_kfree_skb(skb, FREE_WRITE). As soon as this call is made, the sk_buff in question may spontaneously disappear, and the device driver should not reference it again.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- RSS Feeds
- What's the tweeting protocol?
- New Products
- Trying to Tame the Tablet
- Dart: a New Web Programming Experience
- Reply to comment | Linux Journal
15 hours 23 min ago - Reply to comment | Linux Journal
17 hours 56 min ago - Reply to comment | Linux Journal
19 hours 13 min ago - great post
19 hours 48 min ago - Google Docs
20 hours 10 min ago - Reply to comment | Linux Journal
1 day 59 min ago - Reply to comment | Linux Journal
1 day 1 hour ago - Web Hosting IQ
1 day 3 hours ago - Thanks for taking the time to
1 day 4 hours ago - Linux is good
1 day 6 hours ago




Comments
What about rmem_max / rmem_default ?
An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?
Thanks for anyone who would make it clear to me,
Telenn
Missing pictures
The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).
Thank you,
Ovy
Fixed
Should be working now.
Mitch Frazier is an Associate Editor for Linux Journal.
thnx
thanx for the great article..
at each layer the data and tail pointers change right??
so if i need to acces the L7 data,consider UDP can i take the from pre routing hook can i take data+udphdr->length..??
Help Required....
Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.
i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.
Thanx in adavance,
good article
thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.
Ajay