Manipulating OOo Documents with Ruby

Who says you have to wait for some future OS to integrate your office documents with business applications you develop? Work with OpenOffice.org's XML-based documents using Ruby.

The full metadata file is available from the Linux Journal FTP site [ftp.linuxjournal.com/pub/lj/listings/issue119/7236.tgz].

In addition to a main Document class, OOo4R defines a meta class to encapsulate the metadata. A meta class uses an REXML document to hold the contents of meta.xml. A meta object largely is a collection of attributes. Typical usage either would be asking an object for a particular value, such as the name of the author, or assigning a value, such as a new title. One way to code this would be to write a series of explicit attribute accessor methods. We would need two methods for every attribute. Or, we could use dynamic method invocation by grabbing accessor messages, finding a matching meta attribute and either performing the requested action on the corresponding attribute or raising an exception.

The following code example focuses on the Dublin Core metadata elements used in OOo. The Dublin Core Metadata Initiative is an open forum for defining metadata standards. Dublin Core elements often can be found in RSS feeds and some XHTML documents. As with all elements in an OpenOffice.org XML file, the elements have a namespace prefix. Rather than have users know and use these prefixes, we can map the full element name to something friendly.

The definition of the Meta class begins with the creation of a hash that maps friendly names to actual element names, plus a class constant to hold the base XPath for the metadata. The class constructor simply creates an REXML document from the XML source:


module OOo
  class Meta

  NAME_MAP = {
   'description' => 'dc:description',
   'subject'     => 'dc:subject',
   'creator'     => 'dc:creator',
   'author '     => 'dc:creator',
   'date'        => 'dc:date',
   'language'    => 'dc:language',
   'title'       => 'dc:title'
  }
    XPATH_BASE  = "*/office:meta"

    def initialize( src )
      @doc = REXML::Document.new(  src.to_s )
    end

We can redefine the method_missing method available to all Ruby classes so that, rather than raising an exception (as it would do by default), it looks to see if the message sent to the object maps to some item in our metadata:

def method_missing( name, *args )
  n = name.to_s
  if is_assignment? n
    el = map_for_assignment n
    xpath = "#{XPATH_BASE}/#{el}"
    assign( xpath, *args)
  else
    el = Meta.map_name n
    xpath = "#{XPATH_BASE}/#{el}"
    find( xpath  )
  end
end

The first argument to method_missing is a symbol object, so our code grabs the string representation. The is_assignment method simply checks if the name ends with an = character. If this is an assignment request, then map_for_assignment removes any trailing characters following the metadata name and maps the friendly name to the actual Dublin Core element name; assign updates the corresponding element in the REXML document:

def assign( xpath, val )
  node = @doc.elements.to_a( xpath )[0]
  node.text = val
end

If this does not appear to be an assignment, the code tries to read some metadata. As before, the name is mapped, but now the code calls find:

def find( xpath )
 begin
  return @doc.elements.to_a( xpath.to_s )[0].text
 rescue Exception
  raise OOo::OOoException.new(
     "Error with xpath '#{xpath}': #{$!}", $@ )
 end
end

# Helper methods omitted ...

 end
end

The technique works for accessing the other metadata elements, though there are special cases where the metadata is contained in a series of child elements. Updating the zip file contents and writing the zip file back to disk using Ruby's built-in Zip class, lets us save modified OOo documents.

Summary

Because the OpenOffice.org file format uses a fully documented XML format, OOo files may be created or manipulated without requiring OOo itself. Ruby's built-in XML handling and dynamic nature make it a natural fit for OOo tasks.

James Britt runs Neurogami, LCC, a software and design company in Scottsdale, Arizona. He has coauthored a book on XML for the Wrox Press, written various articles on software development and gave a presentation on Ruby and XML at the Third International Ruby Conference in Austin, Texas. He can be reached at jamesgb@neurogami.com.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Manipulating OOo Documents with Ruby

Anonymous's picture

I must point out that there is at least one mistake in the article. Sean Russell is the author of REXML. I had hoped the online article might had been edited with the proper information, but in the meantime please note the correction.

Thanks,

James Britt

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState