Manipulating OOo Documents with Ruby

Who says you have to wait for some future OS to integrate your office documents with business applications you develop? Work with's XML-based documents using Ruby.

The full metadata file is available from the Linux Journal FTP site [].

In addition to a main Document class, OOo4R defines a meta class to encapsulate the metadata. A meta class uses an REXML document to hold the contents of meta.xml. A meta object largely is a collection of attributes. Typical usage either would be asking an object for a particular value, such as the name of the author, or assigning a value, such as a new title. One way to code this would be to write a series of explicit attribute accessor methods. We would need two methods for every attribute. Or, we could use dynamic method invocation by grabbing accessor messages, finding a matching meta attribute and either performing the requested action on the corresponding attribute or raising an exception.

The following code example focuses on the Dublin Core metadata elements used in OOo. The Dublin Core Metadata Initiative is an open forum for defining metadata standards. Dublin Core elements often can be found in RSS feeds and some XHTML documents. As with all elements in an XML file, the elements have a namespace prefix. Rather than have users know and use these prefixes, we can map the full element name to something friendly.

The definition of the Meta class begins with the creation of a hash that maps friendly names to actual element names, plus a class constant to hold the base XPath for the metadata. The class constructor simply creates an REXML document from the XML source:

module OOo
  class Meta

  NAME_MAP = {
   'description' => 'dc:description',
   'subject'     => 'dc:subject',
   'creator'     => 'dc:creator',
   'author '     => 'dc:creator',
   'date'        => 'dc:date',
   'language'    => 'dc:language',
   'title'       => 'dc:title'
    XPATH_BASE  = "*/office:meta"

    def initialize( src )
      @doc =  src.to_s )

We can redefine the method_missing method available to all Ruby classes so that, rather than raising an exception (as it would do by default), it looks to see if the message sent to the object maps to some item in our metadata:

def method_missing( name, *args )
  n = name.to_s
  if is_assignment? n
    el = map_for_assignment n
    xpath = "#{XPATH_BASE}/#{el}"
    assign( xpath, *args)
    el = Meta.map_name n
    xpath = "#{XPATH_BASE}/#{el}"
    find( xpath  )

The first argument to method_missing is a symbol object, so our code grabs the string representation. The is_assignment method simply checks if the name ends with an = character. If this is an assignment request, then map_for_assignment removes any trailing characters following the metadata name and maps the friendly name to the actual Dublin Core element name; assign updates the corresponding element in the REXML document:

def assign( xpath, val )
  node = @doc.elements.to_a( xpath )[0]
  node.text = val

If this does not appear to be an assignment, the code tries to read some metadata. As before, the name is mapped, but now the code calls find:

def find( xpath )
  return @doc.elements.to_a( xpath.to_s )[0].text
 rescue Exception
     "Error with xpath '#{xpath}': #{$!}", $@ )

# Helper methods omitted ...


The technique works for accessing the other metadata elements, though there are special cases where the metadata is contained in a series of child elements. Updating the zip file contents and writing the zip file back to disk using Ruby's built-in Zip class, lets us save modified OOo documents.


Because the file format uses a fully documented XML format, OOo files may be created or manipulated without requiring OOo itself. Ruby's built-in XML handling and dynamic nature make it a natural fit for OOo tasks.

James Britt runs Neurogami, LCC, a software and design company in Scottsdale, Arizona. He has coauthored a book on XML for the Wrox Press, written various articles on software development and gave a presentation on Ruby and XML at the Third International Ruby Conference in Austin, Texas. He can be reached at



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Manipulating OOo Documents with Ruby

Anonymous's picture

I must point out that there is at least one mistake in the article. Sean Russell is the author of REXML. I had hoped the online article might had been edited with the proper information, but in the meantime please note the correction.


James Britt