Network Buffers and Memory Management
The semantics of allocating and queuing buffers for sockets also involve flow control rules and for sending a whole list of interactions with signals and optional settings such as non blocking. Two routines are designed to make this easy for most protocols.
The sock_queue_rcv_skb() function is used to handle incoming data flow control and is normally used in the form:
sk=my_find_socket(whatever);
if(sock_queue_rcv_skb(sk,skb)==-1)
{
myproto_stats.dropped++;
kfree_skb(skb,FREE_READ);
return;
}
This function uses the socket read queue counters to prevent vast amounts of data from being queued to a socket. After a limit is hit, data is discarded. It is up to the application to read fast enough, or as in TCP, for the protocol to do flow control over the network. TCP actually tells the sending machine to shut up when it can no longer queue data.
On the sending side, sock_alloc_send_skb() handles signal handling, the non-blocking flag and all the semantics of blocking until there is space in the send queue, so that you cannot tie up all of memory with data queued for a slow interface. Many protocol send routines have this function doing almost all the work:
skb=sock_alloc_send_skb(sk,....)
if(skb==NULL)
return -err;
skb->sk=sk;
skb_reserve(skb, headroom);
skb_put(skb,len);
memcpy(skb->data, data, len);
protocol_do_something(skb);
Most of this we have met before. The very important line is skb->sk=sk. The sock_alloc_send_skb() has charged the memory for the buffer to the socket. By setting skb->sk, we tell the kernel that whoever does a kfree_skb() on the buffer should credit the memory for the buffer to the socket. Thus, when a device has sent a buffer and freed it, the user is able to send more.
All Linux network devices follow the same interface, but many functions available in that interface are not needed for all devices. An object-oriented mentality is used, and each device is an object with a series of methods that are filled into a structure. Each method is called with the device itself as the first argument, in order to get around the lack of the C++ concept of this within the C language.
The file drivers/net/skeleton.c contains the skeleton of a network device driver. View or print a copy from a recent kernel and follow along throughout the rest of the article.
Figure 7 Structure of a Linux Network Device
Each network device deals entirely in the transmission of network buffers from the protocols to the physical media, and in receiving and decoding the responses the hardware generates. Incoming frames are turned into network buffers, identified by protocol and delivered to netif_rx(). This function then passes the frames off to the protocol layer for further processing.
Each device provides a set of additional methods for the handling of stopping, starting, control and physical encapsulation of packets. All of the control information is collected together in the device structures that are used to manage each device.
All Linux network devices have a unique name that is not in any way related to the file system names devices may have. Indeed, network devices do not normally have a file system representation, although you can create a device which is tied to the device drivers. Traditionally the name indicates only the type of a device rather than its maker. Multiple devices of the same type are numbered upwards from 0; thus, Ethernet devices are known as “eth0”, “eth1”, “eth3” etc. The naming scheme is important as it allows users to write programs or system configuration in terms of “an Ethernet card” rather than worrying about the manufacturer of the board and forcing reconfiguration if a board is changed.
The following names are currently used for generic devices:
ethn Ethernet controllers, both 10 and 100Mbit/second
trn Token ring devices
sln SLIP devices and AX.25 KISS mode
pppn PPP devices both asynchronous and synchronous
plipn PLIP units; the number matches the printer port
tunln IPIP encapsulated tunnels
nrn NetROM virtual devices
isdnn ISDN interfaces handled by isdn4linux (*)
dummyn Null devices
lo The loopback device
(*) At least one ISDN interface is an Ethernet impersonator—the Sonix PC/Volante driver behaves in all aspects as if it was Ethernet rather than ISDN; therefore, it uses an “eth” device name. If possible, a new device should pick a name that reflects existing practice. When you are adding a whole new physical layer type, you should look for other people working on such a project and use a common naming scheme.
Certain physical layers present multiple logical interfaces over one media. Both ATM and Frame Relay have this property, as does multi-drop KISS in the amateur radio environment. Under such circumstances, a driver needs to exist for each active channel. The Linux networking code is structured in such a way as to make this manageable without excessive additional code. Also, the name registration scheme allows you to create and remove interfaces almost at will as channels come into and out of existence. The proposed convention for such names is still under some discussion, as the simple scheme of “sl0a”, “sl0b”, “sl0c” works for basic devices like multidrop KISS, but does not cope with multiple frame relay connections where a virtual channel can be moved across physical boards.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Build a Skype Server for Your Home Phone System
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Why Python?
- A Topic for Discussion - Open Source Feature-Richness?
- Tech Tip: Really Simple HTTP Server with Python
- Great
56 min 24 sec ago - Reply to comment | Linux Journal
1 hour 4 min ago - Understanding the Linux Kernel
3 hours 19 min ago - General
5 hours 48 min ago - Kernel Problem
15 hours 51 min ago - BASH script to log IPs on public web server
20 hours 18 min ago - DynDNS
23 hours 54 min ago - Reply to comment | Linux Journal
1 day 26 min ago - All the articles you talked
1 day 2 hours ago - All the articles you talked
1 day 2 hours ago




Comments
What about rmem_max / rmem_default ?
An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?
Thanks for anyone who would make it clear to me,
Telenn
Missing pictures
The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).
Thank you,
Ovy
Fixed
Should be working now.
Mitch Frazier is an Associate Editor for Linux Journal.
thnx
thanx for the great article..
at each layer the data and tail pointers change right??
so if i need to acces the L7 data,consider UDP can i take the from pre routing hook can i take data+udphdr->length..??
Help Required....
Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.
i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.
Thanx in adavance,
good article
thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.
Ajay