Network Buffers and Memory Management
In order for the network protocol la
yers to perform in a sensible manner, the device has to provide a set of capability flags and variables that are also maintained in the device structure.
The mtu is the largest payload that can be sent over this interface, i.e., the largest packet size not including any bottom layer headers that the device itself will provide. This number is used by the protocol layers such as IP to select suitable packet sizes to send. There are minimums imposed by each protocol. A device is not usable for IPX without a 576 byte frame size or higher. IP needs at least 72 bytes and does not perform sensibly below about 200 bytes. It is up to the protocol layers to decide whether to co-operate with your device.
The family is always set to AF_INET and indicates the protocol family the device is using. Linux allows a device to be using multiple protocol families at once, and maintains this information solely to look more like the standard BSD networking API.
The interface hardware type field is taken from a table of physical media types. The values used by the ARP protocol (see RFC1700) are used by those media that support ARP, and additional values are assigned for other physical layers. New values are added whenever necessary both to the kernel and to net-tools, the package containing programs like ifconfig that need to be able to decode this field. The fields defined as of Linux pre2.0.5 are:
From RFC1700: ARPHRD_NETROM NET/ROM™ devices ARPHRD_ETHER 10 and 100Mbit/second Ethernet ARPHRD_EETHER Experimental Ethernet (not used) ARPHRD_AX25 AX.25 level 2 interfaces ARPHRD_PRONET PROnet token ring (not used) ARPHRD_CHAOS ChaosNET (not used) ARPHRD_IEE802 802.2 networks notably token ring ARPHRD_ARCNET ARCnet interfaces ARPHRD_DLCI Frame Relay DLCI Defined by Linux: ARPHRD_SLIP Serial Line IP protocol ARPHRD_CSLIP SLIP with VJ header compression ARPHRD_SLIP6 6bit encoded SLIP ARPHRD_CSLIP6 6bit encoded header compressed SLIP ARPHRD_ADAPT SLIP interface in adaptive mode ARPHRD_PPP PPP interfaces (async and sync) ARPHRD_TUNNEL IPIP tunnels ARPHRD_TUNNEL6 IPv6 over IP tunnels ARPHRD_FRAD Frame Relay Access Device ARPHRD_SKIP SKIP encryption tunnel ARPHRD_LOOPBACK Loopback device ARPHRD_LOCALTLK Localtalk apple networking device ARPHRD_METRICOM Metricom Radio Network
Those interfaces marked unused are defined types but without any current support on the existing net-tools. The Linux kernel provides additional generic support routines for devices using Ethernet and token ring.
The pa_addr field is used to hold the IP address when the interface is up. Interfaces should start down with this variable clear. pa_brdaddr is used to hold the configured broadcast address, pa_dstaddr is the target of a point to point link, and pa_mask is the IP netmask of the interface. All of these can be initialized to zero. The pa_alen field holds the length of an address (in our case an IP address), and should be initialized to 4.
The hard_header_len is the number of bytes the device needs at the start of a network buffer passed to it. This value does not have to equal the number of bytes of physical header that will be added, although this number is usually used. A device can use this value to provide itself with a scratch pad at the start of each buffer.
In the 1.2.x series kernels, the skb->data pointer will point to the buffer start, and you must avoid sending your scratch pad. This also means that for devices with variable length headers you need to allocate max_size+1 bytes and keep a length byte at the start so that you know where the header actually begins (the header should be contiguous with the data). Linux 1.3.x makes life much simpler. It ensures that you have at least as much room as you requested, free at the start of the buffer. It is up to you to use skb_push() appropriately, as we discussed in the section on networking buffers.
The physical media addresses (if any) are maintained in dev_addr and broadcast respectively and are byte arrays. Addresses smaller than the size of the array are stored starting from the left. The addr_len field is used to hold the length of a hardware address. With many media there is no hardware address, and in this case, this field should be set to zero. For some other interfaces, the address must be set by a user program. The ifconfig tool permits the setting of an interface hardware address. In this case it need not be set initially, but the open code should take care not to allow a device to start transmitting before an address has been set.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- I once had a better way I
35 min 20 sec ago - Not only you I too assumed
52 min 43 sec ago - another very interesting
2 hours 45 min ago - Reply to comment | Linux Journal
4 hours 39 min ago - Reply to comment | Linux Journal
11 hours 33 min ago - Reply to comment | Linux Journal
11 hours 49 min ago - Favorite (and easily brute-forced) pw's
13 hours 40 min ago - Have you tried Boxen? It's a
19 hours 32 min ago - seo services in india
1 day 4 min ago - For KDE install kio-mtp
1 day 4 min ago




Comments
What about rmem_max / rmem_default ?
An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?
Thanks for anyone who would make it clear to me,
Telenn
Missing pictures
The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).
Thank you,
Ovy
Fixed
Should be working now.
Mitch Frazier is an Associate Editor for Linux Journal.
thnx
thanx for the great article..
at each layer the data and tail pointers change right??
so if i need to acces the L7 data,consider UDP can i take the from pre routing hook can i take data+udphdr->length..??
Help Required....
Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.
i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.
Thanx in adavance,
good article
thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.
Ajay