Writing a Program to Control OpenOffice.org, Part 1

Learn how to leverage an existing application to create your own office automation program. First up, a vocabulary and design lesson.

The growing use of office automation programs poses a new problem for software programmers. In fact, users are no more satisfied with programs able to give them the requested data. Graphic presentations not only must be clear, they also must be agreeable and, as it were, fascinating. This is an understandable exigency, because word processors and spreadsheets are widely used, and they allow users to attain good results rather easily, aesthetically speaking. On the other hand, small- and medium-sized software houses can't afford to spend thousands of man-hours trying to compete with Lotus or OpenOffice.org. A good solution is to exploit competitors' services.

Two basic ways are available to invoke these services. The first one requires us to write some procedures, using languages such as VBA or Star Basic, and then make the service supplier process them. The other option is to rely on interprocess communication technologies. The second solution is, in my opinion, the better one from a programmer's point of view.

Once we have decided on our strategy, we must choose the instruments to make it work: an office automation suite and a communication standard. I have opted to use OpenOffice.org and UNO. The first element is well known and my choice is easily justifiable--OOo is reliable, widely used and cross-platform. But what about UNO? UNO, short for Universal Network Objects, is an interprocess communication technology designed by OpenOffice.org and Sun to allow software developers to control the programs that form the homonymous product.

The aim of this series of articles is to explain the UNO programming principles. To accomplish this goal, we will build an application written in C++ that is able to connect to OpenOffice.org, open a spreadsheet and then update, print and close the document. The problems that must be solved in order to build the source code will allow beginners to understand the basic principles of this technology. These problems are:

  1. installing the OpenOffice.org software development kit

  2. building the files to implement the communication process with OpenOffice.org

  3. understanding the basic structure of UNO: what are services, service-factories and interfaces?

  4. writing the source code

  5. writing a Linux makefile

The first three points are treated here in Part 1 of this article series.

Installing the OpenOffice.org SDK

Currently, no compiler is UNO compliant. We therefore need a software development kit (SDK), a set of programs and libraries allowing Java, C++ and Star Basic developers to use UNO.

UNO is available for Linux, Windows and Solaris. The software prerequisites for Linux users are:

  • OpenOffice.org 1.1.x or higher

  • JDK 1.4.1_01 or higher

  • GNU GCC; releases 2.91.x, 2.95.x, 3.0.x, 3.1.x, 3.2.x work fine. The 3.3.x ones cause some runtime compatibility problems; these will be discussed in a future article

  • GNU make 3.79.1 or higher

In addition, although it is not indispensable, I suggest installing stlport; otherwise, managing the Standard Template Library could present some problems. stlport is available here.

The UNO installation process is simple and includes three steps. First, download the SDK tar file, which is available here. Second, rebuild the tar archives. Third, configure some environment variables. The most important variables are OO_SDK_HOME, the SDK installation directory, and OFFICE_HOME, the OpenOffice.org installation directory. This configuration task can be performed by a pair of batch files, stored in the SDK base directory. The first file asks users for the right values to assign to the variables and then writes them to the second file, setsdkenv_linux. The second file must be called to carry out the whole operation. In this regard, we have to be aware that the configuration process performed by setsdkenv_linux effects only the terminal window in which it is executed.

Building the Files for the Communication Process

Like CORBA and COM, UNO does not refer to a particular programming language. In order to define data and services, it uses the UNO Interface Definition Language (UNOIDL), a meta-language similar to ATL and CORBA IDL. The UNOIDL files, with the .idl extension, are stored in OpenOffice sdk directory/idl. Each of them describes an interface or a service. They are similar to source code, in that procedures and applications can't directly call them. That is why the OOo SDK supplies a collection of software instruments able to translate UNOIDL files, making them usable by C++ or Java programs. Here, we solve the problem from the point of view of C++ programmers.

First of all, we must use the .idl files as a base to write the header files we need. The programs that allow us to reach our goal must be called from the command line and are idlc, regmerge and cppumaker, in this order. For more about these development tools, read OpenOffice sdk directory/docs/toolsl.html.)

The first program, idlc, is a compiler and has the following syntax:


idlc [-options]file1.idl ... filen.idl

For each .idl file, idlc creates a binary one with the .urd (UNO reflection data) extension. The structure of .urd files is a tree of classes, where the base class is the root. All the .urd files must be merged in one .rdb (registry database) file, which we refer to when we write the C++ code. Building this registry is the task of regmerge, which also can work with rdb files. Its syntax is:


regmerge <registry_file_name> <start_level> file1 ... filen

Here is an example:


regmerge prova.rdb / prova1.urd 
regmerge prova.rdb / prova2.rdb

All the levels of prova1.urd and prova2.rdb--the / means we start from the root of the trees--are merged in prova.rdb. The .rdb files are binary, as are the .urd ones, but we can read their structures using a program called regview.

Finally, cppumaker builds our header files, starting from a registry. More precisely, cppumaker writes one file each interface, organizing them in a tree of the directory that reflects the .urd files inner structure. Its syntax is:


cppumker [-options] file1 file2 ... filen

Two options often are used with cppumaker. -O specifies the starting path of the header files, while -B is used to choose the starting level of the registry file, usually UCR.

To write all the header files referring to prova.rdb, starting from the current directory, we enter:


cppumaker -BUCR -O. prova.rdb

We must compile every service and interface called by our application. This is a long and error-prone process, because of the great number of files it involves. We can make this operation briefer by using two OpenOffice registry files, services.rdb and types.rdb. They are located in OpenOffice directory/program. regmerge can merge them in one registry file that later can be processed by cppumaker. Doing it this way gives us a registry bigger than we need, but in my opinion, the saved time is worth this price.

Understanding the Basic Structure of UNO

Thus, there are two types of files; .idl files describe data and services, while binary ones implement those data and services. Therefore, we can argue that UNO has two layers, one for each kind of file. After this preliminary comment, though, we have to study the UNO terminology. In my opinion, this is pivotal because we can't fully understand the source code we are going to write without knowing the meaning that words such as service, interface, propriety and service manager have in UNO (see the Developer's Guide in OpenOffice sdk directory/docs/DevelopersGuide/DevelopersGuide.html).

I think that services are the best starting point, as they are linked directly to the well-known concept of objects. The authors of the above mentioned guide often use the word service as a synonymn for object. However, there is a clear difference between the two terms. While an object is the instance of a class, a service is the abstract description of an object. More precisely, according to the Guide, "A service comprises a set of interfaces and properties that are needed to support a certain functionality. It can include other services as well. Services are abstract specifications which have to be implemented." A simile can make this concept clearer. Let's consider flow charts. They are able to describe algorithms by using graphic symbols, without taking care of problems such as memory allocation or runtime crashes. They are abstract, and we can't use them directly, but they give us all the directives we need to build up executable programs. But we could say almost the same for services; they are mere descriptions. The abstract adjective which refers to services, suggesting to that they are not able to do anything; their only function is to expose some interfaces and properties, accessible through the methodologies described below.

The above definition mentions interfaces and properties, too. An interface is a collection of methods describing an aspect of its own service, to make it actual. Generally speaking, it is a class and its implementation depends on programming languages. Instead, when we refer to a set of service characteristics, each of them defined by a name-value pair, we call them properties. Two general methods, getPropertyValue and setPropertyValue, allow us to interact with properties. For example, Figure 1 shows two services, com.sun.star.document.OfficeDocument and com.sun.star.text.TextDocument, with their interfaces.

Figure 1. Two Services and Their Interfaces

But how can we build a new service? This is the service manager task, also known as the service factory. As the name suggests, the service factory can be seen as services that build other services. Service factories need to know the name of the service to be created and no more. At first glance, the very idea of a service factory is useless, as it forces the programmer to write some extra code.

And the same criticism is applicable to the concept of services themselves: why don't we replace services with directly implementable classes? These objections are not groundless, but introducing services and service managers largely improves the flexibility of UNO architecture for two reasons. First, not using service managers would mean transforming services in instantiable classes. Therefore, if OOo programmers updated a service, they would have to write again the entire code of the respective class and modify the architecture of UNO. As we just said, service managers create services based only on their names. Second, services describe what must be done, not how it must be done. Every detail about the actual execution of a task is under the care of interfaces; that is, they rely on a different software layer. Writing new releases of the code is simple, because updating a particular interface does not affect its service.

Services and service managers play an important role in the communication between two processes. Two services belonging to two different processes can exchange information by way of TCP/IP, building what is called a bridge in the UNO environment, but only if both of them have been created by a service factory. Therefore, for this communication process to work, the OpenOffice.org suite has to act as a TCP/IP server, which implies that, at startup, it must listen to a port. To achieve this aim, we have to modify the configuration file OpenOffice directory/share/registry/data/org/openoffice/Setup.xcu (see the Developer's guide). For instance, in order to make OOo listen to port 8100 of the local computer, we have to add the following lines within the <node oor:name="Office"> section.


<prop oor:name="ooSetupConnectionURL"> 
   <value>socket,host=localhost,port=8100;urp;StarOffice.ServiceManager
   </value>
</prop>

To summarize, UNO has a two-layer structure. Services and service factories make up the first layer. It depends on neither the operating system nor the programming language, and it has only a descriptive function. The second layer, however, relies on interfaces and properties. It therefore depends on both the operating system and the programming language, and it actually implements services.

In Part 2, we will learn how to write the source code.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The second layer, however,

Bait's picture

The second layer, however, relies on interfaces and properties.

First, not using service

kordiceps's picture

First, not using service managers would mean transforming services in instantiable classes.

The above definition

Asian's picture

The above definition mentions interfaces and properties, too. An interface is a collection of methods describing an aspect of its own service, to make it actual. Generally speaking, it is a class and its implementation depends on programming languages. Instead, when we refer to a set of service characteristics, each of them defined by a name-value pair, we call them properties.
????

C++ to Automate Office?!

Anonymous's picture

The aim of this series of articles is to explain the UNO programming principles. To accomplish this goal, we will build an application written in C++ that is able to connect to OpenOffice.org, open a spreadsheet and then update, print and close the document.

I came here from the www.openoffice.org site, hoping to learn a tidbit about how to automate Open Office. Many to whom this automation would appeal are Windows VBA programmers, like me. Now, I know this is a Linux site...etc., but I almost vomited a little in my mouth when I read that your example employs C++.

C++ is obtuse and unlikely to be used by the majority of MS-Office migrators. Is it possible to eliminate C++ (and anything else similarly fugly) from the picture?

The C++ language would like to respond...

Ramon F Herrera's picture

The C++ language would like to clarify something to the readers and one poster in particular. According to dictionaries and experts, the term "obtuse" cannot possibly be applied to us, programming languages, only to you, programmers.

Obtuse, adj:
lacking in insight or discernment; "too obtuse to grasp the implications of his behavior"

dense: slow to learn or understand; lacking intellectual acuity; "so dense he never understands anything I say to him"; "never met anyone quite so dim"; "although dull at classical learning, at mathematics he was uncommonly quick"- Thackeray; "dumb officials make some really dumb decisions";

Respectfully,

C++ Stroustrup

Heaven forbid you

Anonymous's picture

Heaven forbid you contemplate revisiting a language. After all, new versions of it are not announced each year. It's almost like it's designers really are focusing on important ideas!

See : A tutorial for Program

Anonymous's picture

See :
A tutorial for Programming OpenOffice.org with Visual Basic http://www.oooforum.org/forum/viewtopic.phtml?t=11854&highlight=

Serge Moutou

No need to vomit, OpenOffice

Anonymous's picture

No need to vomit, OpenOffice has its own Basic.

C++ Free Documentation

Anonymous's picture

I am writing a free document on C++/UNO (220 pages) You can download it here : http://perso.wanadoo.fr/moutou/UNOCpp_AP01.sxw

That doesn't mean this article is not interesting. In the contrary, we can find here interesting informations I have not mentioned in my document...
Thank you for your article.
Serge Moutou

Portability

Craig Ringer's picture

I've been interested in UNO for some time. It strikes me as a possible way to use OO.o as an import filter, either by having it convert various formats to OpenDocument then importing the resulting OpenDocument file, or by using UNO to "walk" the file after loading it in OO.o .

I've always been put off by a few things:

  • Portability. I understand that UNO is not very portable, and for my uses that'd be a real problem.
  • Packaging. OO.o's download packages are VERY different to how OO.o is packaged by most linux distributors. I can see the potential for a lot of extreme autohell here.
  • Complexity and learning curve. It seems to be really hard to get a handle on UNO to actually get in and just do something. It's documented in awesome volume, but finding introductory information for C++ has been hard. I'm very glad to see this article because of that.
  • So far the idea of using PyUNO to handle the OO.o side, combined with some improvements to Python interface of the app I work on, has seemed like the most attractive option. I'll be interested to see how you find working with UNO in C++ to be.

    The issues with gcc 3.3 do not inspire confidence, given that many distros (and Mac OS X) are on to gcc4 now.

Re: Portability

Anonymous's picture

Portability: have a look at http://udk.openoffice.org to learn that UNO is portable. Language bindings are available for C++, Java and Python. Additionally UNO components can be accessed from OpenOffice.org Basic, CLI and via OLE Automation. UNO runs at least on Linux, Windows and Solaris

Packaging: OpenOffice.org Linux download packages contain RPM packages. That's the package type used of most of the linux distributions. For OpenOffice.org 2.0 Debian packages might be available as well.

Complexity: Like with most other software development kits it's a prerequisite to know how to program.

Please take a look at the following pages:
http://api.openoffice.org/SDK/index.html
http://api.openoffice.org/docs/DevelopersGuide/DevelopersGuide.htm
http://api.openoffice.org/docs/cpp/ref/index.html

UNO packaging IS a problem for Gnu/Linux

Anonymous's picture


>>Packaging. OO.o's download packages are VERY different to how OO.o is
>>packaged by most linux distributors. I can see the potential for
>>a lot of extreme autohell here.

>Packaging: OpenOffice.org Linux download packages contain RPM packages.

That might be true for the OpenOffice core, not (at least in the short term) for the UNO extensions the article nd the original "packaging autohell" concern are about. See Fixing the problem with OpenOffice extensions for a better description of the problem

Re: UNO packaging IS a problem for Gnu/Linux

Anonymous's picture

btw.

have you heard of the UNO Runtime Environment (URE) ?

Re: UNO packaging IS a problem for Gnu/Linux

Anonymous's picture

I don't believe it's a problem for them. An UNO package is a zip archive and zip/unzip are usually part of every Linux distro...

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState