DIPC: The Linux Way of Distributed Programming
Before Linux, powerful UNIX operating systems were considered a luxury. Linux made it possible for ordinary people to have access to an affordable and reliable computing platform. The only problem is that Linux was originally based on decades-old designs (see Resources 7), making it less attractive for more technically minded users. Linux's answer to this problem is either port and adaptation or introduction of newer concepts.
Building multi-computers (see Resources 1) and programming them are among the more popular research subjects and demand for them is rapidly rising. Any solution to distributed programming under Linux should keep up with one of Linux's more important features: availability to ordinary users.
Linux already had symmetric multi-processing capabilities. However, it did not provide enough standard facilities for distributed software development. Programmers and users had to resort to different add-on packages and various programming models to write or use distributed software. The mechanisms provided by these packages usually differed greatly from one another, each requiring users to learn some new material which was not of any use to them when migrating to other methods. Many also required detailed involvement of the programmer in the process of transferring data over the network; an example is the PVM (Parallel Virtual Machine) software (see Resources 8). The need for a simpler distributed programming model, usable by more programmers, was obvious.
DIPC (Distributed Inter-Process Communication) is a software-only solution for enabling people to build and program multiple computers easily. Each node can be an ordinary personal computer. These nodes must be connected to each other by a TCP/IP (see Resources 3) network. It does not use network broadcasting, which helps it work in networks without such capabilities. Also, no assumption of a synchronized clock is made. These features mean that DIPC could be used in a wide area network (WAN).
Right from the start, it was decided that ease of application programming and the simplicity of the DIPC itself should be among the most important factors in the system design, even if it were to mean some loss in performance. This decision was backed by the observation that computing and telecommunications equipment speeds are improving rapidly, while training and programming times for distributed applications are not.
In DIPC, UNIX System V IPC mechanisms (see Resources 4), consisting of semaphores, messages and shared memories, are modified to function in a network environment. This means that installing DIPC requires changing and recompiling the kernel. Here, the same system calls used to provide communication between processes running on the same computer can be used to allow the communication of processes running on different machines. There is no new system call for the application programmer's use. There is also no library to be linked to the application code, and no need for any modifications in compilers. DIPC could be used with any language that allows access to the operating system's system calls. It is completely camouflaged in the kernel.
The result is that DIPC supports both the message passing and the distributed shared memory paradigms of distributed programming, providing more options for the application programmer (see Resources 5). It also allows the processes to share only selected parts of their address space in order to reduce the problems of false sharing.
It was decided to implement DIPC in the user space as much as possible, with minimal changes to the kernel. This leads to a cleaner and simpler design, but in a monolithic operating system such as Linux it has the drawback of requiring frequent copy operations between kernel and user address spaces (see Resources 2). As UNIX does not allow user space processes to access and change kernel data structures at will, DIPC must have two parts. The more important part is a program named dipcd, which runs with superuser privileges; dipcd forks several processes to do its work. The other part is inside the kernel giving dipcd work and letting it see and manipulate kernel data. The two parts use a private system call to exchange data. This system call must not be used by other processes in the system.
DIPC provides easy data transfer over the network and assumes that the code to use these data already resides at the suitable places. This is justifiable when one considers that in most cases, the program's code changes much less frequently than the data.
DIPC is only concerned with providing mechanisms for distributed programming. The policies, e.g., how a program is parallelized, or where an application program's processes should run, are determined by the programmer or the end user.
DIPC enables the creation of clusters of PCs. Computers in the same cluster could work together to solve a problem. DIPC's clusters are logical entities, meaning they are independent of any physical network characteristics. Computers could be added or deleted from a cluster without the need to change any of the network parameters. Several clusters may exist in the same physical network, but each computer can belong to at most one of them. Computers on the same cluster can even be connected to each other by a WAN. As far as DIPC is concerned, computers in one cluster never interact with computers in other clusters.
In normal System V IPC, processes specify numerical keys to gain access to the same IPC structure (see Resources 4). They can then use these structures to communicate with each other. A key normally has a unique meaning in only one computer. DIPC makes the IPC keys globally known. Here, if the application programmer wishes, a key can have the same meaning in more than one machine. Processes on different computers can communicate with each other the same way they did in a single machine.
Information about all the IPC keys in use is kept by one of dipcd's processes called the referee. Each cluster has only one referee. In fact, it is having the same referee that places computers in the same cluster. All other processes in the cluster refer to this one to find out if a key is in use. The referee is DIPC's name server. Besides many other duties, the referee also makes sure that only one computer at a time will attempt to create an IPC structure with a given key value, hence the name. Using a central entity simplifies the design and implementation but can become a bottleneck in large configurations. Finding a remedy to this problem is left to the time when DIPC is actually running in such configurations.
Users may need to run some programs (e.g., utilities) in all the computers in the system at the same time, and these programs may need to use the same IPC keys. This could create interference. To prevent any unwanted interactions, distributed IPC structures are declared by programmers. The programmer must specify a flag to do this. The structures are local by default. The mentioned flag is the only thing the programmer should do to create a distributed program. The rest is like ordinary System V IPC programming. Other than this flag to keep DIPC compatible with older programs, the system is totally transparent to programmers.
DIPC's programming model is simple and quite similar to using ordinary System V IPC. First, a process creates and initializes the needed IPC structures. After that, other processes are started to collaborate on the job. All of them can access the same IPC structures and exchange data. These processes are usually executing in remote machines, but they could also be running on the same computer, meaning distributed programs can be written on a single machine and later run on real multi-computers.
An important point about DIPC is that no other UNIX facility is changed to work in a distributed environment. Thus, programmers cannot use system calls, such as fork, which create a process in the local computer.
The fact that DIPC programs use numerical keys to transfer data means they do not need to know where the corresponding IPC structures are located. DIPC makes sure that processes find the needed resources just by using the specified keys. The resources could be located in different computers during different runs of a distributed program. This logical addressing of resources makes the programs independent of any physical network characteristics.
Simple techniques allow the mapping from logical computing resources needed by a program to physical resources to be done with no need to remake the program. As DIPC programs do not need to use any physical network addresses, they do not need recompiling to run in new environments. Of course, this does not prevent the programmer from choosing to make his program dependent on some physical system characteristics. For example, he could hard code a computer address in his code. DIPC programmers are discouraged from doing this type of coding.
When dipcd is not running, the kernel parts of DIPC are short circuited, causing the system to behave like a normal Linux operating system. As a result, users can easily disable the distributed system. Also, normal Linux kernels are not affected by DIPC programs, meaning there is no need to change and recompile these programs when they are to be executed in single computers with no DIPC support.
Distributed Shared Memory (DSM) (see Resources 6) in DIPC uses a multiple-readers/single-writer protocol. DIPC replicates the contents of the shared memory in each computer with reader processes so they can work in parallel, but there can be only one computer with processes that write to a shared memory. The strict consistency model is used here, meaning that a read will return the most recently written value. It also means there is no need for the programmer to do any special synchronization activity when accessing a distributed shared memory segment. The major disadvantage with this scheme is a possible loss of performance in comparison to other DSM consistency models.
DIPC can be configured to provide a segment-based or page-based DSM. In the first case, DIPC transfers the entire contents of the shared memory from computer to computer, with no regard as to whether all data will be used. This could reduce the data transfer administration time. In the page-based mode, 4KB pages are transferred as needed, making possible multiple parallel writes to different pages.
In DIPC, each computer is allowed to access the shared memory for at least a configurable time quantum. This lessens the chance of the shared memory being transferred frequently over the network, which could result in bad performance.
DIPC assumes a fail-stop (see Resources 9) distributed environment, so it employs time-outs to find out about any problem. The at-most-once semantics (see Resources 1) is used here, meaning DIPC tries everything just once. In case of error, it simply informs the relevant processes, either by a system call return value or, for shared memory read/writes, by a signal. DIPC itself does not do anything to overcome the problem. The user processes should decide how to deal with the error. This is normal behavior in many other cases in UNIX .
It is important to provide some means to make sure that the data are accessed only by people with proper permissions. DIPC uses login names, not user IDs, to identify users. Remote operations are performed after assuming the identity of the person who executed the system call originally. For this to work, one login name on all computers in a DIPC cluster should denote the same person.
In order to prevent intrusion to DIPC clusters, addresses of the computers allowed to take part in a cluster should be put in a file for DIPC to consult.
DIPC is under development mainly in the Iran University of Science and Technology's (IUST) Department of Computer Engineering, but people from different parts of the world are currently working on it. A port to Linux for Motorola 680x0 processors has been completed. This made DIPC a heterogeneous system, as the two versions can communicate with each other. DIPC's sources and related documents can be found on the Internet via anonymous FTP at sunsite.unc.edu, in /pub/Linux/system/network/distrib/, or can be downloaded from DIPC's web page at http://wallybox.cei.net/dipc/.
DIPC belongs to the Linux users community, and the ultimate goal is to make it an integral part of the Linux operating system. Considering the inadequacy of computing and informational facilities in IUST, the only way to make sure this software will survive is for interested people to join in its development.
To subscribe to DIPC's mailing list, send e-mail to email@example.com with the body containing “subscribe linux-dipc”. Postings should go to firstname.lastname@example.org.
DIPC is a simple distributed system that works by bringing new functionality to an IPC system designed decades ago. Many of the DIPC's nicer features are the result of its being hidden inside the kernel. Considering its main characteristics, DIPC has the potential to introduce ordinary programmers to distributed programming, thus making Linux one of the first operating systems with usable and really used distributed programming facilities.
Several experimental distributed systems are available for use. Many of them have been implemented in universities running UNIX variants on workstations produced by different manufacturers. The fact that, in most cases, researchers did not have free access to the underlying operating system's source code has had a big influence on the design decisions. The availability of source code in Linux has provided new ways to deal with the problems of distributed programming. DIPC is an example of what can be done when one has access to the operating system sources. Some could mention the problems in porting DIPC to proprietary operating systems with no publicly available source code as a drawback. However, in our opinion, proprietary operating system vendors and their users are the ones at a loss here, as they cannot take advantage of more easy-to-use distributed systems developed by third parties. This statement does not mean DIPC could not be implemented in other UNIX variants supporting System V IPC, but implies that the port can only be attempted by people with access to kernel source code.