Internet Servers in Perl
In my previous article in Issue #35 of Linux Journal, I wrote about the socket library functions in Perl with an emphasis on writing Internet client programs. Perl is also a good language for Internet servers, not only because of the socket capabilities and the ease of dealing with files and data, but because it also has a special mode for improving security. In this article I cover several aspects of writing Perl servers, including how to use the basic socket functions, how to best handle multiple connections, asynchronous communication and security issues. In the process we'll develop a simple Internet server similar to fingerd that works through the Web.
Socket communication may be either connection-oriented or connectionless. Connection-oriented protocols, like the Internet's Transmission Control Protocol (TCP), establish a link between client and server before exchanging any data. Connectionless protocols, like the User Datagram Protocol (UDP), simply read or write data, specifying the client or server address each time. Most servers use a connection-oriented scheme, and we use this approach in our example server (see Listing 1). However, I discuss the connectionless approach below.
Any Internet server, from the simplest to the most complicated, first uses the two functions socket and bind to establish an identifiable communications endpoint. The server uses socket to create a socket with the desired type and protocol. Recall the syntax for this function is:
socket SOCKET, DOMAIN, TYPE, PROTOCOL
SOCKET here is a Perl file handle initialized by the call to socket. For Internet TCP applications DOMAIN is AF_INET and TYPE is SOCK_STREAM. The Perl 5 Socket package defines the constants AF_INET and SOCK_STREAM as well as other socket-related constants and functions; refer to the previous article for details. The
An Internet server must bind a network address to the socket with the bind function. A client can bind an address, but it is not usually necessary in connection-oriented clients. This is also referred to as “naming the socket”. This process specifies the network address to which a client must connect to start communicating with the server. The syntax of bind is:
bind SOCKET, NAME
The SOCKET argument is still the file handle created by the call to socket. NAME is the address that is bound to the socket. The contents of this argument can be quite complicated (again, refer to the previous article for details). For versions of Perl from 5.2 on, a function in the Socket package called sockaddr_in returns a value for the NAME argument given a port number and an Internet host address. If you're writing something like an ftp or HTTP server, you can use the reserved “well-known” port number (see the file /etc/services for these numbers). Otherwise, any positive 16-bit integer will suffice as long as it is not one of the reserved numbers. For servers the special argument INADDR_ANY can be used for the Internet address, which lets the kernel pick an address for the socket.
For connection-oriented servers like our example program we now can use the listen function to tell the operating system that we'll accept connections on the socket. This function looks like this:
listen SOCKET, QUEUESIZE
We all know what SOCKET is by now. QUEUESIZE specifies the number of attempted connections that can be kept waiting; the symbol SOMAXCONN is the maximum for this argument (usually 5). This lets the server handle several near-simultaneous connection requests, a crucial feature for HTTP servers or daemons like inetd.
Now a client program could attempt to connect to the server, but we need more code to actually create the link. For many servers, the accept function is called, typically in a loop of some sort, directly after listen. The syntax of accept is:
accept NEWSOCKET, GENERICSOCKET
This function opens NEWSOCKET, a file handle that you can read from or write to in order to communicate with the connecting client. GENERICSOCKET is any open, named socket. For our server, this is the named socket we've already created with socket and bind. accept returns the address of NEWSOCKET in the same form as the NAME argument to bind.
Note that the accept call waits until a connection request arrives, so no processing can occur until it completes. This usually poses no problem since it matches the way most servers work: they wait for a request and then service it. Sometimes, though, an application might perform other tasks, like calculation or system monitoring, that can't be stopped to wait for client connections. If so, communication can be done asynchronously—that is, processing can be interrupted temporarily using a signal handler to make the socket connection and to process the client's request. I don't cover this in detail since that requires a lengthy digression into the fcntl system call and signal handlers, but Listing 2 illustrates the basic idea.
UDP does not guarantee reliability; extra user code must deal with problems caused by packets that don't make it to their destinations. The Internet's main connectionless protocol is called UDP, or User Datagram Protocol. A datagram contains all of the information required to send it to the right place. needed. For a connectionless server, listen and accept are not needed. A connectionless client usually does need to use bind so that a valid return address gets passed to the server in the client's data packets, but we won't worry about the client side here. To use UDP on our socket rather than TCP, we simply replace the socket argument SOCK_STREAM with SOCK_DGRAM and the getprotobyname argument tcp with udp.
In C we use the system functions sendto and recvfrom to send data between client and server with UDP, but Perl doesn't implement these directly. Instead, Perl uses send and recv for both connection-oriented and connectionless protocols. After setting up the socket with socket and bind, a connectionless server would usually call recv:
recv SOCKET, SCALAR, LEN, FLAGS
This function blocks until data becomes available on SOCKET, then reads LEN bytes into the scalar variable SCALAR. FLAGS are the same flags as for the recv system call. recv returns the address of the client, which can then be used to send information back with the send function:
send SOCKET, MSG, FLAGS, TOTO is the client address. The socket code in the simplest connectionless server would look something like this:
socket(S, AF_INET, SOCK_DGRAM, \ getprotobyname('udp')); bind(S, sockaddr_in( $port, INADDR_ANY) ); $cli_addr = recv S, $request, 80, 0; send S, $message, 0, $cli_addr;Now back to our TCP server. Remember I mentioned earlier that several connection requests can get queued up so the server can respond to each in turn. This might be inefficient (and probably annoying to the client user) if the server does something that takes a significant amount of time, like querying a database or running an external program. To get around this problem, many servers fork a new process to handle a request once they accept a connection. Look at our example server code for details. The only slightly tricky part is the CHLD signal handler used to clean up zombie processes.
Servers often run as setuid or setgid programs, meaning the processes have the privileges of the user or group that owns the executable file regardless of who runs the program. At the very least, a server program will run under your own user ID. Since anyone can, in principle, use an Internet server, you can see security is of the utmost importance. You must make sure the server does not give privileged access to important system files or your own confidential data. Usually this requires checks on environment variables, file privileges, external program execution, etc., so that it's hard to be thorough. Fortunately, Perl helps us out here with its taint mode, a mode that checks for common security violations. The -T command line switch turns on this mode, so we just add this to the “shebang” line at the top of the script.
The exec function in the example server might cause security concerns for at least two reasons. First, executing an external program implies the use of the PATH environment variable. This variable is considered to be tainted until we set it explicitly in the script, since it could be modified to cause the execution of a program other than the one we intended. Second, we separate the arguments to exec into the program name and the argument list, which prevents exec from calling the shell to do metacharacter substitutions. If these modification were not made, the taint mode would send warnings to the terminal and stop the program (in fact, that's how I found these problems). Keep in mind taint mode does not guarantee security, but it does make it much easier to identify well-known problems.
Network servers are among the most complex pieces of software, which is to say, you should by no means consider this article a comprehensive treatment of the subject. Still, you'll be surprised to find how many of the elements of our simple example program show up in even large, complicated servers. Perl does reduce some of the complexity though, since you already have convenient tools at hand to do the hard parts, like parsing protocols and manipulating files. Even if you ultimately decide to write the program in C or some other compiled language, Perl can't be surpassed for prototyping server applications. The price is right too, but I don't need to convince Linux users of the value of “free” software.
Mike Mull writes software to simulate sub-microscopic objects. Stranger still, people pay him to do this. Mike thinks Linux is nifty. His favorite programming project is his 2-year-old son, Nathan. Mike can be reached at firstname.lastname@example.org.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Tech Tip: Really Simple HTTP Server with Python
- Non-Linux FOSS: Caffeine!
- Returning Values from Bash Functions
- Managing Linux Using Puppet
- Doing for User Space What We Did for Kernel Space
- Rogue Wave Software's Zend Server
- Google's SwiftShader Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide