Network Buffers and Memory Management
In order for the network protocol la
yers to perform in a sensible manner, the device has to provide a set of capability flags and variables that are also maintained in the device structure.
The mtu is the largest payload that can be sent over this interface, i.e., the largest packet size not including any bottom layer headers that the device itself will provide. This number is used by the protocol layers such as IP to select suitable packet sizes to send. There are minimums imposed by each protocol. A device is not usable for IPX without a 576 byte frame size or higher. IP needs at least 72 bytes and does not perform sensibly below about 200 bytes. It is up to the protocol layers to decide whether to co-operate with your device.
The family is always set to AF_INET and indicates the protocol family the device is using. Linux allows a device to be using multiple protocol families at once, and maintains this information solely to look more like the standard BSD networking API.
The interface hardware type field is taken from a table of physical media types. The values used by the ARP protocol (see RFC1700) are used by those media that support ARP, and additional values are assigned for other physical layers. New values are added whenever necessary both to the kernel and to net-tools, the package containing programs like ifconfig that need to be able to decode this field. The fields defined as of Linux pre2.0.5 are:
From RFC1700: ARPHRD_NETROM NET/ROM™ devices ARPHRD_ETHER 10 and 100Mbit/second Ethernet ARPHRD_EETHER Experimental Ethernet (not used) ARPHRD_AX25 AX.25 level 2 interfaces ARPHRD_PRONET PROnet token ring (not used) ARPHRD_CHAOS ChaosNET (not used) ARPHRD_IEE802 802.2 networks notably token ring ARPHRD_ARCNET ARCnet interfaces ARPHRD_DLCI Frame Relay DLCI Defined by Linux: ARPHRD_SLIP Serial Line IP protocol ARPHRD_CSLIP SLIP with VJ header compression ARPHRD_SLIP6 6bit encoded SLIP ARPHRD_CSLIP6 6bit encoded header compressed SLIP ARPHRD_ADAPT SLIP interface in adaptive mode ARPHRD_PPP PPP interfaces (async and sync) ARPHRD_TUNNEL IPIP tunnels ARPHRD_TUNNEL6 IPv6 over IP tunnels ARPHRD_FRAD Frame Relay Access Device ARPHRD_SKIP SKIP encryption tunnel ARPHRD_LOOPBACK Loopback device ARPHRD_LOCALTLK Localtalk apple networking device ARPHRD_METRICOM Metricom Radio Network
Those interfaces marked unused are defined types but without any current support on the existing net-tools. The Linux kernel provides additional generic support routines for devices using Ethernet and token ring.
The pa_addr field is used to hold the IP address when the interface is up. Interfaces should start down with this variable clear. pa_brdaddr is used to hold the configured broadcast address, pa_dstaddr is the target of a point to point link, and pa_mask is the IP netmask of the interface. All of these can be initialized to zero. The pa_alen field holds the length of an address (in our case an IP address), and should be initialized to 4.
The hard_header_len is the number of bytes the device needs at the start of a network buffer passed to it. This value does not have to equal the number of bytes of physical header that will be added, although this number is usually used. A device can use this value to provide itself with a scratch pad at the start of each buffer.
In the 1.2.x series kernels, the skb->data pointer will point to the buffer start, and you must avoid sending your scratch pad. This also means that for devices with variable length headers you need to allocate max_size+1 bytes and keep a length byte at the start so that you know where the header actually begins (the header should be contiguous with the data). Linux 1.3.x makes life much simpler. It ensures that you have at least as much room as you requested, free at the start of the buffer. It is up to you to use skb_push() appropriately, as we discussed in the section on networking buffers.
The physical media addresses (if any) are maintained in dev_addr and broadcast respectively and are byte arrays. Addresses smaller than the size of the array are stored starting from the left. The addr_len field is used to hold the length of a hardware address. With many media there is no hardware address, and in this case, this field should be set to zero. For some other interfaces, the address must be set by a user program. The ifconfig tool permits the setting of an interface hardware address. In this case it need not be set initially, but the open code should take care not to allow a device to start transmitting before an address has been set.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- New Products
- New Products
- The Pari Package On Linux
- Dart: a New Web Programming Experience
- This is the easiest tutorial
2 hours 33 min ago - Ahh, the Koolaid.
8 hours 12 min ago - git-annex assistant
14 hours 11 min ago - direct cable connection
14 hours 34 min ago - Agreed on AirDroid. With my
14 hours 44 min ago - I just learned this
14 hours 48 min ago - enterprise
15 hours 18 min ago - not living upto the mobile revolution
18 hours 10 min ago - Deceptive Advertising and
18 hours 45 min ago - Let\'s declare that you have
18 hours 46 min ago




Comments
What about rmem_max / rmem_default ?
An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?
Thanks for anyone who would make it clear to me,
Telenn
Missing pictures
The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).
Thank you,
Ovy
Fixed
Should be working now.
Mitch Frazier is an Associate Editor for Linux Journal.
thnx
thanx for the great article..
at each layer the data and tail pointers change right??
so if i need to acces the L7 data,consider UDP can i take the from pre routing hook can i take data+udphdr->length..??
Help Required....
Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.
i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.
Thanx in adavance,
good article
thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.
Ajay