Network Buffers and Memory Management
The semantics of allocating and queuing buffers for sockets also involve flow control rules and for sending a whole list of interactions with signals and optional settings such as non blocking. Two routines are designed to make this easy for most protocols.
The sock_queue_rcv_skb() function is used to handle incoming data flow control and is normally used in the form:
sk=my_find_socket(whatever);
if(sock_queue_rcv_skb(sk,skb)==-1)
{
myproto_stats.dropped++;
kfree_skb(skb,FREE_READ);
return;
}
This function uses the socket read queue counters to prevent vast amounts of data from being queued to a socket. After a limit is hit, data is discarded. It is up to the application to read fast enough, or as in TCP, for the protocol to do flow control over the network. TCP actually tells the sending machine to shut up when it can no longer queue data.
On the sending side, sock_alloc_send_skb() handles signal handling, the non-blocking flag and all the semantics of blocking until there is space in the send queue, so that you cannot tie up all of memory with data queued for a slow interface. Many protocol send routines have this function doing almost all the work:
skb=sock_alloc_send_skb(sk,....)
if(skb==NULL)
return -err;
skb->sk=sk;
skb_reserve(skb, headroom);
skb_put(skb,len);
memcpy(skb->data, data, len);
protocol_do_something(skb);
Most of this we have met before. The very important line is skb->sk=sk. The sock_alloc_send_skb() has charged the memory for the buffer to the socket. By setting skb->sk, we tell the kernel that whoever does a kfree_skb() on the buffer should credit the memory for the buffer to the socket. Thus, when a device has sent a buffer and freed it, the user is able to send more.
All Linux network devices follow the same interface, but many functions available in that interface are not needed for all devices. An object-oriented mentality is used, and each device is an object with a series of methods that are filled into a structure. Each method is called with the device itself as the first argument, in order to get around the lack of the C++ concept of this within the C language.
The file drivers/net/skeleton.c contains the skeleton of a network device driver. View or print a copy from a recent kernel and follow along throughout the rest of the article.
Figure 7 Structure of a Linux Network Device
Each network device deals entirely in the transmission of network buffers from the protocols to the physical media, and in receiving and decoding the responses the hardware generates. Incoming frames are turned into network buffers, identified by protocol and delivered to netif_rx(). This function then passes the frames off to the protocol layer for further processing.
Each device provides a set of additional methods for the handling of stopping, starting, control and physical encapsulation of packets. All of the control information is collected together in the device structures that are used to manage each device.
All Linux network devices have a unique name that is not in any way related to the file system names devices may have. Indeed, network devices do not normally have a file system representation, although you can create a device which is tied to the device drivers. Traditionally the name indicates only the type of a device rather than its maker. Multiple devices of the same type are numbered upwards from 0; thus, Ethernet devices are known as “eth0”, “eth1”, “eth3” etc. The naming scheme is important as it allows users to write programs or system configuration in terms of “an Ethernet card” rather than worrying about the manufacturer of the board and forcing reconfiguration if a board is changed.
The following names are currently used for generic devices:
ethn Ethernet controllers, both 10 and 100Mbit/second
trn Token ring devices
sln SLIP devices and AX.25 KISS mode
pppn PPP devices both asynchronous and synchronous
plipn PLIP units; the number matches the printer port
tunln IPIP encapsulated tunnels
nrn NetROM virtual devices
isdnn ISDN interfaces handled by isdn4linux (*)
dummyn Null devices
lo The loopback device
(*) At least one ISDN interface is an Ethernet impersonator—the Sonix PC/Volante driver behaves in all aspects as if it was Ethernet rather than ISDN; therefore, it uses an “eth” device name. If possible, a new device should pick a name that reflects existing practice. When you are adding a whole new physical layer type, you should look for other people working on such a project and use a common naming scheme.
Certain physical layers present multiple logical interfaces over one media. Both ATM and Frame Relay have this property, as does multi-drop KISS in the amateur radio environment. Under such circumstances, a driver needs to exist for each active channel. The Linux networking code is structured in such a way as to make this manageable without excessive additional code. Also, the name registration scheme allows you to create and remove interfaces almost at will as channels come into and out of existence. The proposed convention for such names is still under some discussion, as the simple scheme of “sl0a”, “sl0b”, “sl0c” works for basic devices like multidrop KISS, but does not cope with multiple frame relay connections where a virtual channel can be moved across physical boards.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Home, My Backup Data Center
- A Topic for Discussion - Open Source Feature-Richness?
- Dart: a New Web Programming Experience
- Developer Poll
- What's the tweeting protocol?
- May 2013 Issue of Linux Journal: Raspberry Pi
- Reply to comment | Linux Journal
53 min 30 sec ago - great post
1 hour 28 min ago - Google Docs
1 hour 50 min ago - Reply to comment | Linux Journal
6 hours 39 min ago - Reply to comment | Linux Journal
7 hours 26 min ago - Web Hosting IQ
8 hours 59 min ago - Thanks for taking the time to
10 hours 36 min ago - Linux is good
12 hours 34 min ago - Reply to comment | Linux Journal
12 hours 51 min ago - Web Hosting IQ
13 hours 21 min ago




Comments
What about rmem_max / rmem_default ?
An admirable in-depth article. Just a stupid question (I'm so slow-witted) : I still don't catch the link between the rmem_default/rmem_max sysctl parameters (socket receive buffer default/max length) and the buffer allocated by dev_alloc_skb(). Socket receive buffer vs buffer of skb : are we talking about he same memory area, or are they different things (involving necessarily a copy from the one to the other, sooner or later) ?
Thanks for anyone who would make it clear to me,
Telenn
Missing pictures
The links to figures do not work (File not found error). I guess time does matter (1996 article!). To anyone reading this article, please provide us some links for the pictures (or link to some other up to date articles).
Thank you,
Ovy
Fixed
Should be working now.
Mitch Frazier is an Associate Editor for Linux Journal.
thnx
thanx for the great article..
at each layer the data and tail pointers change right??
so if i need to acces the L7 data,consider UDP can i take the from pre routing hook can i take data+udphdr->length..??
Help Required....
Hi Alan Cox,
Thanx for the article.
Iam Ram.Iam new to device driver development.
some how i manged to write a network driver.
still i need some help.But I want to access the driver functions directly from user program written in c.
i.e. I want to access the open,close,hard_start_xmit(),ioctl functions directly without using the socket api(socket,bind,connect etc). I want my own function api.
is it possible to do it.
Thanx in adavance,
good article
thanks for this article. It explains most of the things. But still I feel that some more thing related to Bottom Half/Top half processing should be added. and also things are not clear about the logic of freeing/owning skbuffers.
Ajay