Manipulating OOo Documents with Ruby

Who says you have to wait for some future OS to integrate your office documents with business applications you develop? Work with OpenOffice.org's XML-based documents using Ruby.

The full metadata file is available from the Linux Journal FTP site [ftp.linuxjournal.com/pub/lj/listings/issue119/7236.tgz].

In addition to a main Document class, OOo4R defines a meta class to encapsulate the metadata. A meta class uses an REXML document to hold the contents of meta.xml. A meta object largely is a collection of attributes. Typical usage either would be asking an object for a particular value, such as the name of the author, or assigning a value, such as a new title. One way to code this would be to write a series of explicit attribute accessor methods. We would need two methods for every attribute. Or, we could use dynamic method invocation by grabbing accessor messages, finding a matching meta attribute and either performing the requested action on the corresponding attribute or raising an exception.

The following code example focuses on the Dublin Core metadata elements used in OOo. The Dublin Core Metadata Initiative is an open forum for defining metadata standards. Dublin Core elements often can be found in RSS feeds and some XHTML documents. As with all elements in an OpenOffice.org XML file, the elements have a namespace prefix. Rather than have users know and use these prefixes, we can map the full element name to something friendly.

The definition of the Meta class begins with the creation of a hash that maps friendly names to actual element names, plus a class constant to hold the base XPath for the metadata. The class constructor simply creates an REXML document from the XML source:


module OOo
  class Meta

  NAME_MAP = {
   'description' => 'dc:description',
   'subject'     => 'dc:subject',
   'creator'     => 'dc:creator',
   'author '     => 'dc:creator',
   'date'        => 'dc:date',
   'language'    => 'dc:language',
   'title'       => 'dc:title'
  }
    XPATH_BASE  = "*/office:meta"

    def initialize( src )
      @doc = REXML::Document.new(  src.to_s )
    end

We can redefine the method_missing method available to all Ruby classes so that, rather than raising an exception (as it would do by default), it looks to see if the message sent to the object maps to some item in our metadata:

def method_missing( name, *args )
  n = name.to_s
  if is_assignment? n
    el = map_for_assignment n
    xpath = "#{XPATH_BASE}/#{el}"
    assign( xpath, *args)
  else
    el = Meta.map_name n
    xpath = "#{XPATH_BASE}/#{el}"
    find( xpath  )
  end
end

The first argument to method_missing is a symbol object, so our code grabs the string representation. The is_assignment method simply checks if the name ends with an = character. If this is an assignment request, then map_for_assignment removes any trailing characters following the metadata name and maps the friendly name to the actual Dublin Core element name; assign updates the corresponding element in the REXML document:

def assign( xpath, val )
  node = @doc.elements.to_a( xpath )[0]
  node.text = val
end

If this does not appear to be an assignment, the code tries to read some metadata. As before, the name is mapped, but now the code calls find:

def find( xpath )
 begin
  return @doc.elements.to_a( xpath.to_s )[0].text
 rescue Exception
  raise OOo::OOoException.new(
     "Error with xpath '#{xpath}': #{$!}", $@ )
 end
end

# Helper methods omitted ...

 end
end

The technique works for accessing the other metadata elements, though there are special cases where the metadata is contained in a series of child elements. Updating the zip file contents and writing the zip file back to disk using Ruby's built-in Zip class, lets us save modified OOo documents.

Summary

Because the OpenOffice.org file format uses a fully documented XML format, OOo files may be created or manipulated without requiring OOo itself. Ruby's built-in XML handling and dynamic nature make it a natural fit for OOo tasks.

James Britt runs Neurogami, LCC, a software and design company in Scottsdale, Arizona. He has coauthored a book on XML for the Wrox Press, written various articles on software development and gave a presentation on Ruby and XML at the Third International Ruby Conference in Austin, Texas. He can be reached at jamesgb@neurogami.com.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Manipulating OOo Documents with Ruby

Anonymous's picture

I must point out that there is at least one mistake in the article. Sean Russell is the author of REXML. I had hoped the online article might had been edited with the proper information, but in the meantime please note the correction.

Thanks,

James Britt

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix