Building a Distributed Spreadsheet in Modula-3
Back when Borland introduced Turbo Pascal 1.0, Philip Khan did something shrewd: he included the source code for a simple spreadsheet, which is why many programmers bought the product. At a time when Lotus 1-2-3 was the killer application, nothing was more enticing than a glimpse of its key data structure—the sparse matrix.
Of course, the spreadsheet is no longer leading edge. So what might its updated version be? Judging by recent market fanfare, I'd say a spreadsheet that is distributed, multi-platform and web-aware. How would you go about building one?
Delphi, the most recent incarnation of Pascal, is not a bad choice—provided you can live within Windows alone. For us, however, Linux compatibility is a must. You could try to master the intricacies of CORBA, but that standard is now engaged in a turf war with Microsoft's DCOM, a creature of even more convoluted behavior. However, there is another choice available to the Linux programmer.
The Modula-3 language and its surrounding system offer a simple, clean, mature and robust tool for writing distributed applications. (See the sidebar “A Brief Biography”.) In this article I'll highlight the steps necessary for building a distributed spreadsheet. My goal is not to provide a full-fledged product, but rather a framework of code that illustrates all the key components.
There are three senses in which a piece of software can be considered “distributed”.
The data and computation can be divided into separate processes. In particular, the data can be viewed from multiple clients (GUI viewers), even though it is stored elsewhere.
The executables can reside on separate machines—for instance, a pair of Linux servers supporting some mixture of Windows and Linux clients.
The work can be distributed between people. You and I may be collaborating remotely on the same spreadsheet, with precautions taken to ensure that I don't overwrite your entries by mistake.
Compared to traditional applications, distributed software is harder to design and get right. In spite of this, it allows for growth and flexible organization.
Three basic ingredients are required by our task:
A spreadsheet object: Initially, it is enough to use a two-dimensional array. Once our application is up and running, experience will help refine the object's interface. Later, the fixed array can be replaced with a sparse matrix.
A display widget: Having the user interface separate from the data eases modifications and simplifies the task of cross-platform deployment.
Connecting glue: The spreadsheet object and display widget need to be able to talk to each other.
In Modula-3, Network Objects provide the connecting glue. The beauty is that as far as your code is concerned, invoking an object somewhere on the Net is nearly as easy as one inside your own program. Most of the hard work is done for you.
As a modern, general purpose systems programming language, Modula-3 is lean in design, yet practical and powerful. Applications range from the fun things (multiuser games), to the serious (operating systems), to the deadly serious (911 call centers). Ten years of use has made the reference compiler solid and dependable.
Current implementations exist for Win32 and popular incarnations of Unix. The Linux port, in particular, receives constant attention. Several versions are available for download, including the full source tree. (For pointers, see the sidebar “Modula-3 Resources”).
Beyond openness, the language has numerous features to recommend it, including:
A clean, Algol-derived syntax
Explicit support for modules and interfaces
A mechanism for calling external C code and libraries
Both traditional and object types (with single inheritance)
Built-in threads and mutexes for multi-threaded programming
Assertions and exceptions to support error handling
An incremental garbage collector to simplify memory usage
If this reminds you of Java, that's no accident. Though the syntax of Java is derived from C++, many key improvements descend directly from Modula-3. One implementation of Modula-3 even allows mix-and-match integration with Java.
Features located in “the first ring out”, though not defined in the language itself, include:
Quake, a simplified build language that replaces make
Standard libraries of algorithms and container objects
A lightweight database component
A multi-platform windowing system with user interface toolkit
Network objects allow us to proceed in stages. First, a spreadsheet can be constructed as a single executable. Next, as multiple processes running on one machine. Finally, as multiple processes running over multiple machines. The jumps between stages are small.
We need some underlying data structure for our spreadsheet, so let's begin simply by typing:
TYPE Grid: REF ARRAY OF ARRAY OF INTEGER;
TYPE Grid: REF ARRAY OF ARRAY OF Money.T;This defines a two dimensional grid of integers (in the first line), or, as a second option, of type Money.T. Integers are a built-in type. Money.T is a programmer-defined type; the “.T” suffix is a Modula-3 convention. (In a real spreadsheet, each column would have a distinct user-defined type. Let that detail pass for now.)
A new grid can be allocated on the heap during variable declarations, if you wish, or during program execution.
VAR myGrid : Grid := NEW (Grid, rows, cols); BEGIN myGrid := NEW (Grid, 100, 20); END.
The second assignment of myGrid will wipe out the first, but don't be alarmed—we do not have a memory leak. The Modula-3 garbage collector takes care of reclaiming lost memory. This is also true of object variables (no destructors necessary), including objects that allocate memory on remote machines.
To flesh out our spreadsheet object, we next attach some operator methods to the grid. A good place for this is in a separate “interface” file. Listing 1 contains an initial cut at spreadsheet.i3. Our object is now declared to be a Spreadsheet.T type.
The important property of an interface is that it contains no executable code whatsoever. That's reserved for “.m3” or module files. The interface does not say how something is computed, merely what it does. This is similar to .h files in C, but is more strict. Only the operations explicitly exposed in an interface—or “exported” to use the jargon—are available for outside use.
(The sharp reader may have noticed that the representation of Grid is exposed in spreadsheet.i3—a bad thing. Modula-3 does allow you to hide details of representation inside implementation files. That would take us into a discussion of opaque types, however, a more advanced topic.)
Modula-3 comes with a multi-platform windowing system called Trestle. Built upon Trestle is a user interface toolkit called VBTkit, and a UI builder, FormsVBT. You may call X directly if you wish (alternatively, the Win32 GDI), but in doing so you lose portability.
A description of your program's user interface is called a “Trestle Form”. A form is a textual description of names and values, organized using nested parentheses. Form elements consist of windows, frames, buttons and so on, as well as properties such as color. Listing 2 is a sample form for a popup calculator, as shown in Figure 1.
The important point is that a form is defined in its own file, outside any Modula-3 code. This separation of concerns proves valuable when the user interface designer is a different person from the primary coder. The form does not describe how to construct the interface, merely what it looks like. The FormsVBT library builds it at run time and hooks it into your code.
Figure 1. Appearance of Calculator.fv
Suppose our spreadsheet is implemented, along with a suite of test functions. To build a program, we must inform the compiler what source files comprise our executable. This is done in a Modula-3 make file, or m3makefile. An example is shown in Listing 3.
To build your program, at the command-line prompt type:
The compiler will determine dependency relations for you, recompiling only what is necessary.
Converting a regular object (restricted to a single address space) to a network object (visible over the Net) is not as difficult as you might imagine. You must attend to four details.
First, the network object library needs to be linked in. This is performed in the m3makefile (Listing 3).
Second, make the following two changes to the spreadsheet interface:
IMPORT Money; IMPORT NetObj; (* new statement *) TYPE T = NetObj.T OBJECT (* modified line *) grid: Grid; name: TEXT; METHODS ...
Third, and this matters only at execution time, a network object daemon needs to be running in the background. The program is supplied as part of Modula-3. Start the daemon by typing:
netobjd &In a client-server architecture, the spreadsheet object resides with the server, yet it is the client that issues method calls (to update a cell, for example). Clients need to find out about each other. This is the fourth detail.
The netobj daemon acts like a bulletin board. First, the server posts a note saying, “I've got a spreadsheet object for sale.” Then the client comes along and says, “I'll buy that.” The server exports; the client imports; the daemon mediates. In the nomenclature of CORBA, the daemon is an object request broker. Once the sale is complete, the client and server talk to each other directly. Code details are found in Listing 4.
Listing 4 will work when the server and client are located on the same machine. Suppose instead that the server runs on some Linux box—eggnog.cmu.edu—and that the clients are elsewhere. Ensure that netobjd is running on eggnog and change one line in the client program.
address := NetObj.Locate( "eggnog.cmu.edu" );
With that, our programs now talk over the Net.
Because Modula-3 comes ready-made with thread support, it also provides mutexes (mutual exclusion semaphores) so that parallel operations on the same datum are serialized. In our discussion so far, the Money.T type has been left unspecified. It might actually be something like this:
INTERFACE Money; TYPE T = MUTEX OBJECT cents: INTEGER; END; END Money.
Mutexes protect data so that client B does not modify values before client A is finished. Granted, protecting each cell separately is overkill. A more elegant approach is to protect ranges of cells, with the lock initiated by user action.
Figure 2 shows a spreadsheet from the point of view of user A (Alice). She is working on the cell range tinted red. User B (Bob) cannot modify these cells. He is working on the blue cells, indicating to Alice that to her they are read only.
Figure 2. Simple Multiuser Spreadsheet
To port our user interface program from Linux to Windows NT, do the following:
Archive the client source code by using the tar command.
Copy the tar file to your Windows machine.
Unarchive the file using tar. Convert end-of-line markers.
At the command line, type m3build.
Assuming there are no stunts of low-level programming, all the Modula-3 code in this example—including the GUI—is transparently portable. Differing path name conventions, for example, are hidden behind OS-independent interfaces. There's not an #ifdef in sight.
In this article I've highlighted the creation of a multi-platform, distributed spreadsheet using Modula-3. The key step is to wrap the spreadsheet into a network object. In this way, remote objects may be invoked with exactly the same syntax as local objects. Most of the hard work is done for you.
Modula-3 is not the only means for creating distributed applications, but in my mind it strikes an optimal balance between simplicity and power. By its very intent, it is a language for building large, solid systems in order for you to get your work done.
Clearly, my discussion has omitted many details. To help fill this gap, a companion tutorial is available on the Web (see the sidebar “Getting Started”.) Full source code is available for experimentation and invention.
John Kominek holds a master's degree in Computer Science from the University of Waterloo, and is currently a graduate student at CMU. When pressed, he admits to pronouncing Linux to rhyme with Linus. He can be reached via e-mail at email@example.com.