At the Forge - Google Web Services
SOAP, formerly the Simple Object Access Protocol, but now an acronym that officially doesn't stand for anything, provides a relatively easy method for sending an XML-encapsulated query to a server. The server then responds with an XML-encoded response. Over the years, SOAP has strayed far from its simple roots. Although SOAP is still easier to understand, implement and work with than some more complicated protocols (such as CORBA), it is more difficult than most people would like to admit. If I can get away with it, I personally prefer to use XML-RPC for Web services. Although XML-RPC doesn't offer all of the features of SOAP, it is far easier to work with.
That said, Google requires that we use SOAP, and with many good SOAP client libraries available nowadays, we should not be afraid to work with it. Perl programmers have a particularly strong implementation, known as SOAP::Lite, at their disposal. For the programming examples in this article, we use Perl and SOAP::Lite. Note that the Lite part of the module name describes the ease with which programmers can implement Web services, not a stripped-down version of SOAP. You can install the latest version of SOAP::Lite from CPAN by typing:
perl -MCPAN -e 'install SOAP::Lite'
The SOAP::Lite installation will ask you to indicate which tests, if any, you want to perform before installing the module. I normally accept the defaults, but you might want to add to or remove from these depending on your needs.
With SOAP::Lite installed, it's time to write a program that queries Google. But to do that, we need to know the URL of the service, as well as the method that we will be invoking on Google's computer, along with the names and types of any parameters we want to send. We could specify these by hand, but that would mean a lot of work on our part. Moreover, Google currently expects SOAP requests to be pointed at api.google.com/search/beta2. If Google ever decides to change that URL without warning, many people might be surprised and upset.
Luckily, Google has provided a WSDL file, describing the services offered via Google's APIs, as well as the request and response parameters the system accepts. It also describes the endpoint for queries, allowing Google (in theory) to make changes to the service without notifying developers in advance. Of course, this assumes that the WSDL file itself will remain in the same location. It also assumes that the names of the services will not change, and that each of them is documented somewhere, because the choice of which method to invoke still requires human intervention.
WSDL is written in XML, and it is fairly easy to understand, once you realize that it's describing nothing more than the various Web services available on a particular server, including the number, names and types of inputs. Thus, the WSDL entry for doGoogleSearch, which performs the basic Google search of Web content, is defined as follows:
<message name="doGoogleSearch"> <part name="key" type="xsd:string"/> <part name="q" type="xsd:string"/> <part name="start" type="xsd:int"/> <part name="maxResults" type="xsd:int"/> <part name="filter" type="xsd:boolean"/> <part name="restrict" type="xsd:string"/> <part name="safeSearch" type="xsd:boolean"/> <part name="lr" type="xsd:string"/> <part name="ie" type="xsd:string"/> <part name="oe" type="xsd:string"/> </message>
To use WSDL from within a Perl program using SOAP::Lite, we invoke SOAP::Lite->service with the WSDL file's URL. If the file resides on the local filesystem, make sure that the URL begins with file:. For example:
my $google_wsdl = "http://api.google.com/GoogleSearch.wsdl"; my $query = SOAP::Lite->service($google_wsdl);
SOAP::Lite is then smart enough to look through the WSDL and make all of the advertised methods dynamically available, such that we can do the following:
my $results = $query->doGoogleSearch($google_key, $query_string, $starting_page, $max_results, $filter, $geographic_restriction, $safe_search, $language_restriction, 'utf-8', 'utf-8');
Do you see what happened here? There is a one-to-one mapping between the inputs described in the WSDL and the parameters that we pass to $query->doGoogleSearch().
We have now seen the core of our Google search program written in Perl. All that's left is to review the input parameters and the contents of $results, which contains the results returned from Google.
The documentation for the API at www.google.com/apis/reference.html describes the input parameters. All of them are mandatory, but some of them are more important than others. In particular, the Google key and the query string typically will be set, and the others will be set with simple default values, as you can see in Listing 1.