Network Buffers and Memory Management
It is necessary for the high level protocols to append low level headers to each frame before queuing it for transmission. It is also clearly undesirable that the protocol know in advance how to append low level headers for all possible frame types. Thus, the protocol layer calls down to the device with a buffer that has at least dev->hard_header_len bytes free at the start of the buffer. It is then up to the network device to correctly call skb_push() and to put the header on the packet using the dev->hard_header() method. Devices with no link layer header, such as SLIP, may have this method specified as NULL.
The method is invoked by giving the buffer concerned, the device's pointers, its protocol identity, pointers to the source and destination hardware addresses and the length of the packet to be sent. As the routine can be called before the protocol layers are fully assembled, it is vital that the method use the length parameter, not the buffer length.
The source address can be NULL to mean “use the default address of this device”, and the destination can be NULL to mean “unknown”. If as a result of an unknown destination, the header can not be completed, the space should be allocated and any bytes that can be filled in should be filled in. The function must then return the negative of the bytes of header added. This facility is currently only used by IP when ARP processing must take place. If the header is completely built, the function must return the number of bytes of header added to the beginning of the buffer.
When a header cannot be completed the protocol layers will attempt to resolve the necessary address. When this situation occurs, the dev->rebuild_header() method is called with the address at which the header is located, the device in question, the destination IP address and the network buffer pointer. If the device is able to resolve the address by whatever means available (normally ARP), then it fills in the physical address and returns 1. If the header cannot be resolved, it returns 0 and the buffer will be retried the next time the protocol layer has reason to believe resolution will be possible.
There is no receive method in a network device, because it is the device that invokes processing of such events. With a typical device, an interrupt notifies the handler that a completed packet is ready for reception. The device allocates a buffer of suitable size with dev_alloc_skb(), and places the bytes from the hardware into the buffer. Next, the device driver analyses the frame to decide the packet type. The driver sets skb->dev to the device that received the frame. It sets skb->protocol to the protocol the frame represents, so that the frame can be given to the correct protocol layer. The link layer header pointer is stored in skb->mac.raw, and the link layer header removed with skb_pull() so that the protocols need not be aware of it. Finally, to keep the link and protocol isolated, the device driver must set skb->pkt_type to one of the following:
PACKET_BROADCAST Link layer broadcast
PACKET_MULTICAST Link layer multicast
PACKET_SELF Frame to us
PACKET_OTHERHOST Frame to another single host
This last type is normally reported as a result of an interface running in promiscuous mode.
Finally, the device driver invokes netif_rx() to pass the buffer up to the protocol layer. The buffer is queued for processing by the networking protocols after the interrupt handler returns. Deferring the processing in this fashion dramatically reduces the time interrupts are disabled and improves overall responsiveness. Once netif_rx() is called, the buffer ceases to be property of the device driver and can not be altered or referred to again.
Flow control on received packets is applied at two levels by the protocols. First, a maximum amount of data can be outstanding for netif_rx() to process. Second, each socket on the system has a queue which limits the amount of pending data. Thus, all flow control is applied by the protocol layers. On the transmit side a per device variable dev->tx_queue_len is used as a queue length limiter. The size of the queue is normally 100 frames, which is large enough that the queue will be kept well filled when sending a lot of data over fast links. On a slow link such as a slip link, the queue is normally set to about 10 frames, as sending even 10 frames is several seconds of queued data.
One piece of magic that is done for reception with most existing devices, and one that you should implement if possible, is to reserve the necessary bytes at the head of the buffer to land the IP header on a long word boundary. The existing Ethernet drivers thus do:
skb=dev_alloc_skb(length+2); if(skb==NULL) return; skb_reserve(skb,2); /* then 14 bytes of ethernet hardware header */
to align IP headers on a 16 byte boundary, which is also the start of a cache line and helps give performance improvements. On the SPARC or DEC Alpha these improvements are very noticeable.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Unikernels, Docker, and Why You Should Care
- Happy GPL Birthday VLC!
- Handheld Emulation: Achievement Unlocked!
- Giving Silos Their Due
- Controversy at the Linux Foundation
- Don't Burn Your Android Yet
- Wine 1.8 Released
- New Products