Introducing SOAP

SOAP is something you may find a use for, even if you're not intersted in three-tier web applications.

In the January and February installments of “At the Forge”, I demonstrated a simple three-tier web application using a database, web server and the Mason templating system for mod_perl. We were able to see some of the advantages and disadvantages of a three-tier web application, particularly when compared with its two-tier counterpart.

But as I pointed out last month, our three-tier architecture was incomplete and wasn't necessarily a fair demonstration. That's because our Perl middleware object layer had to reside on the same computer as the components we wrote for HTML::Mason, a templating system built on mod_perl. Depending on how you count things, this might be considered a two-tier application, albeit one with an object-oriented abstraction layer between the tiers.

In order to put the Mason components and Perl objects on separate computers, we somehow need the ability to call an object method across a network. That is, the following line of Perl would work, regardless of whether $object resides on the same computer as our Apache server or somewhere else on the Internet:

$object->method($arg1, $arg2);

Distributed-object technology and remote-procedure calls have existed for many years on a variety of platforms. In almost every case, this technology was restricted to a particular language or platform. DCOM (Distributed Component Object Model) allows objects of any language to communicate but only under Windows. Java's RMI (Remote Method Invocation) can only communicate with other Java objects. CORBA is an exception to this, allowing objects to communicate across platforms and languages, but CORBA is complex, has taken awhile to get off the ground and isn't yet a part of most programmers' knowledge base.

In response to these proprietary and complex protocols, a number of people in the Internet community have created SOAP, the Simple Object Access Protocol, that makes it extremely easy to create distributed applications. Two of the biggest proponents of SOAP have been Dave Winer (famous for his Scripting News “weblog”) and Microsoft, which is not usually associated with open standards and cross-platform protocols. Regardless of what we in the Linux community might think, Microsoft has publicly embraced SOAP, making it a cornerstone of its .NET effort.

SOAP History and Concepts

SOAP depends on the idea that any two computers on the Internet can communicate using HTTP, the protocol that powers the Web. (Actually, SOAP can be transmitted over nearly any high-level protocol, including SMTP and POP3, but HTTP is by far the most common.) It then transmits information using XML, the markup language that allows us to create tags and document standards. The server turns the incoming XML into an object method call, and then turns the object's response into an XML document that is returned as the HTTP response. Since both HTTP and XML are open standards, published by the World Wide Web Consortium, they can be (and are) implemented on a variety of platforms and, thus, interact without any trouble.

The predecessor to SOAP, known simply as XML-RPC, provided a simple mechanism for remote procedure calls (RPC) using data formatted in XML and transmitted over HTTP. For a variety of reasons, including the fact that XML-RPC could not handle advanced data structures, the W3C adopted SOAP.

A number of languages and platforms continue to support XML-RPC, and it's possible that some situations might call for its use because it has a smaller overhead. Practically speaking, however, the fact that SOAP has gotten so much attention has led to the development, use and debugging of its libraries to a much greater extent than those for XML-RPC. As of this writing, however, there are more implementations of SOAP than XML-RPC, meaning that your choice of platform or language might force your hand toward one protocol or the other.

SOAP, as its name implies, expects to work with objects rather than simple procedure calls. Thus, SOAP client invokes a method on a particular object on the server. The method is specified in the body of the XML document itself, while the object with which it is associated is named in an HTTP “SOAPAction” header. Of course, we also need to specify a computer name and port to which the SOAP request can be directed.

The server itself, including its name and the port number on which the SOAP request is transmitted, is known as the SOAP proxy. This makes sense when you consider that the HTTP server is simply relaying an object method invocation and isn't doing any of this work by itself. Do not confuse the SOAP proxy with an HTTP proxy. An HTTP proxy relays requests from an HTTP client to an HTTP server and often performs security checks and caching. A SOAP proxy, by contrast, relays messages between a SOAP client and an object on the proxy's computer.

The object for which the SOAP server acts as a proxy is sometimes known as the endpoint and is specified in a “SOAPAction” HTTP header. The name of the endpoint can be virtually any text string, including hierarchy separators such as :: and /. In practice, the endpoint has a direct connection to the object hierarchy associated with the language in which the SOAP proxy is written. In Perl, the endpoint might be something like “Foo/Bar”, which refers to the Foo::Bar object located in the file Foo/