Linux Network Programming, Part 1
Like most other Unix-based operating systems, Linux supports TCP/IP as its native network transport. In this series, we will assume you are fairly familiar with C programming on Linux and with Linux topics such as signals, forking, etc.
This article is a basic introduction to using the BSD socket interface for creating networked applications. In the next article, we will deal with issues involved in creating (network) daemon processes. Future articles will cover using remote procedure calls and developing with CORBA/distributed objects.
The TCP/IP suite of protocols allows two applications, running on either the same or separate computers connected by a network, to communicate. It was specifically designed to tolerate an unreliable network. TCP/IP allows two basic modes of operation—connection-oriented, reliable transmission and connectionless, unreliable transmission (TCP and UDP respectively). Figure 1 illustrates the distinct protocol layers in the TCP/IP suite stack.
Figure 1. TCP/IP Protocol Layers
TCP provides sequenced, reliable, bi-directional, connection-based bytestreams with transparent retransmission. In English, TCP breaks your messages up into chunks (not greater in size than 64KB) and ensures that all the chunks get to the destination without error and in the correct order. Being connection-based, a virtual connection has to be set up between one network entity and the other before they can communicate. UDP provides (very fast) connectionless, unreliable transfer of messages (of a fixed maximum length).
To allow applications to communicate with each other, either on the same machine (using loopback) or across different hosts, each application must be individually addressable.
TCP/IP addresses consist of two parts—an IP address to identify the machine and a port number to identify particular applications running on that machine.
The addresses are normally given in either the “dotted-quad” notation (i.e., 127.0.0.1) or as a host name (foobar.bundy.org). The system can use either the /etc/hosts file or the Domain Name Service (DNS) (if available) to translate host names to host addresses.
Port numbers range from 1 upwards. Ports between 1 and IPPORT_RESERVED (defined in /usr/include/netinet/in.h—typically 1024) are reserved for system use (i.e., you must be root to create a server to bind to these ports).
The simplest network applications follow the client-server model. A server process waits for a client process to connect to it. When the connection is established, the server performs some task on behalf of the client and then usually the connection is broken.
The most popular method of TCP/IP programming is to use the BSD socket interface. With this, network endpoints (IP address and port number) are represented as sockets.
The socket interprocess communication (IPC) facilities (introduced with 4.2BSD) were designed to allow network-based applications to be constructed independently of the underlying communication facilities.
To create a server application using the BSD interface, you must follow these steps:
Create a new socket by typing: socket().
bind an address (IP address and port number) to the socket by typing: bind. This step identifies the server so that the client knows where to go.
listen for new connection requests on the socket by typing: listen().
accept new connections by typing: accept().
Often, the servicing of a request on behalf of a client may take a considerable length of time. It would be more efficient in such a case to accept and deal with new connections while a request is being processed. The most common way of doing this is for the server to fork a new copy of itself after accepting the new connection.
Figure 2. Representation of Client/Server Code
The code example in Listing 1 shows how servers are implemented in C. The program expects to be called with only one command-line argument: the port number to bind to. It then creates a new socket to listen on using the socket() system call. This call takes three parameters: the domain in which to listen to, the socket type and the network protocol.
The domain can be either the PF_UNIX domain (i.e., internal to the local machine only) or the PF_INET (i.e., all requests from the Internet). The socket type specifies the communication semantics of the connection. While a few types of sockets have been specified, in practice, SOCK_STREAM and SOCK_DGRAM are the most popular implementations. SOCK_STREAM provides for TCP reliable connection-oriented communications, SOCK_DGRAM for UDP connectionless communication.
The protocol parameter identfies the particular protocol to be used with the socket. While multiple protocols may exist within a given protocol family (or domain), there is generally only one. For TCP this is IPPROTO_TCP, for UDP it is IPPROTO_UDP. You do not have to explicitly specify this parameter when making the function call. Instead, using a value of 0 will select the default protocol.
Once the socket is created, its operation can be tweaked by means of socket options. In the above example, the socket is set to reuse old addresses (i.e., IP address + port numbers) without waiting for the required connection close timeout. If this were not set, you would have to wait four minutes in the TIME_WAIT state before using the address again. The four minutes comes from 2 * MSL. The recommended value for MSL, from RFC 1337, is 120 seconds. Linux uses 60 seconds, BSD implementations normally use around 30 seconds.
The socket can linger to ensure that all data is read, once one end closes. This option is turned on in the code. The structure of linger is defined in /usr/include/linux/socket.h. It looks like this:
struct linger
{
int l_onoff; /* Linger active */
int l_linger; /* How long to linger */
};
If l_onoff is zero, lingering is disabled. If it is non-zero, lingering is enabled for the socket. The l_linger field specifies the linger time in seconds.
The server then tries to discover its own host name. I could have used the gethostname() call, but the use of this function is deprecated in SVR4 Unix (i.e., Sun's Solaris, SCO Unixware and buddies), so the local function _GetHostName() provides a more portable solution.
Once the host name is established, the server constructs an address for the socket by trying to resolve the host name to an Internet domain address, using the gethostbyname() call. The server's IP address could instead be set to INADDR_ANY to allow a client to contact the server on any of its IP addresses—used, for example, with a machine with multiple network cards or multiple addresses per network card.
After an address is created, it is bound to the socket. The socket can now be used to listen for new connections. The BACK_LOG specifies the maximum size of the listen queue for pending connections. If a connection request arrives when the listen queue is full, it will fail with a connection refused error. [This forms the basis for one type of denial of service attack —Ed.] See sidebar on TCP listen() Backlog.
Having indicated a willingness to listen to new connection requests, the socket then prepares to accept the requests and service them. The example code achieves this using an infinite for() loop. Once a connection has been accepted, the server can ascertain the address of the client for logging or other purposes. It then forks a child copy of itself to handle the request while it (the parent) continues listening for and accepting new requests.
The child process can use the read() and write() system calls on this connection to communicate with the client. It is also possible to use the buffered I/O on these connections (e.g., fprint()) as long as you remember to fflush() the output when necessary. Alternatively, you can disable buffering altogether for the process (see the setvbuf() (3) man page).
As you can see from the code, it is quite common (and good practice) for the child processes to close the inherited parent-socket file descriptor, and for the parent to close the child-socket descriptor when using this simple forking model.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- The Pari Package On Linux
- Home, My Backup Data Center
- New Products
- New Products
- This is the easiest tutorial
4 hours 58 min ago - Ahh, the Koolaid.
10 hours 37 min ago - git-annex assistant
16 hours 36 min ago - direct cable connection
16 hours 59 min ago - Agreed on AirDroid. With my
17 hours 9 min ago - I just learned this
17 hours 13 min ago - enterprise
17 hours 43 min ago - not living upto the mobile revolution
20 hours 35 min ago - Deceptive Advertising and
21 hours 10 min ago - Let\'s declare that you have
21 hours 11 min ago




Comments
Great tutorial. Thx. I
Great tutorial. Thx. I recomment also Beej tutorial.
REQ:multiple client - server communication
good explanation for starters, i have a question, how does the server able to maintain the communication between the multiple clients? how does the server identifies that this particular message have come from this particular client only?
help me out!
answer
hi,
u have asked a nice question..........
A) when multiple clients connect to a server at first we r using "listen" which creates an socket and then accepts the connections from a client at this point an another socket is created and the original socket "listen" will remains available for future connections and this listen socket behaves as a file descriptors gives u a method of serving with multiple clients...
And u asked one more question how the server identifies , this is done by u r OS(operating system) maintains a table in the kernel that which client is connecting to which server...