The OASIS Standard for Office Documents: How All Users and Developers Can Benefit
At a lower level, what is needed to manage OASIS files in a larger application, where the source language usually is C or C++ and the performance must be maximized? First of all, the program must include the proper library to compress and uncompress zipped files. This is not an OASIS-specific issue, so we won't deal with it further.
Once the single XML files are available, they have to be loaded in a way that understands and makes accessible the internal structure, that is, the relationships among the several elements. Once this step has been performed, data can be converted or processed in any manner. A lot of tools for this already exist. Several of them are designed to support general XML rather than OASIS, but the difference is quite a bit smaller than one might expect. And this situation is expected to improve soon after the standard is released.
Expat is a popular XML parser written in C that is basic and lacks a validation capability but still is the fastest one around. It also has front ends for practically every language. A more featureful library that supports DTD validation and is designed specifically for GNOME is Libxml. Like Expat, Libxml is written in C, is portable and can be used within a lot of languages. The Xerces parser, in Java, also can generate and validate XML documents.
In the Qt/KDE field, developers have at their disposal, besides the OOo plugin already mentioned, the related Qt classes and DOM implementation (QDom) to write or parse XML, as well as the KOffice DTD. At the time of this writing, these tools still target the KOffice XML format, but they are expected to converge on the OASIS standard.
For security-conscious developers, the easiest starting point is the C XML security library (XMLsec), based on LibXML2, which supports both signing and encryption of XML material. SAXEcho is a (mostly) Java program that attaches itself to a running OpenOffice.org document to show the XML tree representation of the current document. It also validates or modifies the document operating directly on XML nodes, plus several other nifty things.
The parsers described above build an internal tree representation of the document. What should one do when developing applications that must deal with large documents? Keep in mind that large here means too big to fit into memory, which is not so big if this format must be usable even for low-end desktop applications.
The current solutions in this space follow the so-called SAX (simple API for XML) approach: instead of building the whole tree of a document in one fell swoop and keeping it there for further processing, go step by step. A SAX parser reads the document and, instead of keeping it all in memory, generates an event every time it finds something worthwhile. The parser then passes the event to event handlers that interact with the application. The something worthwhile can be XML document-type definitions, errors or elements of the actual content. A good starting point for SAX-based programming is the SAX Project. SAX2 already is supported in Java through JAXP and in Perl through the Orchard Project, which is quite stable, not to mention fast and lightweight, as far as SAX and XML processing are concerned.
All the research done for this article confirmed one of my first impressions: so far, the free software/open-source software approach to guarantee information interchange has been to develop cross-platform applications, which are difficult to maintain and optimize for each target environment. Now it looks like we are starting to do the right thing, which is to define truly Free, standard, toolkit-independent, cross-platform formats that leave everyone free to create any possible front end to read and write them.
Thanks above all to Gary Edwards and David Faure for all the material and explanations. Pierre Souchay (kfile-plugin-ooo) and the AbiWord developers also were very helpful.
Resources
AbiWord: www.abisource.com
CPAN: www.cpan.org
EU-IDA: europa.eu.int/ISPO/ida
Expat: expat.sourceforge.net
kfile-plugin-ooo: bad.sheep.free.fr/kfile-plugin-ooo.html
KOffice: koffice.kde.org
KOffice DTD: www.koffice.org/DTD/kword-1.2.dtd
Libxml: xmlsoft.org
OASIS Office File Format TC: www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
OASIS Web Site: www.oasis-open.org
OOo2sDbk: www.chez.com/ebellot/ooo2sdbk
ooo2txt: ooo2txt.free.fr
OpenOffice.org: www.openoffice.org
Orchard: orchard.sourceforge.net
QDom: doc.trolltech.com/3.1/xml-tools.html
RTF2XML: www.xmeta.com/omlette/rtf2xml
SAXEcho: xml.openoffice.org/saxecho
SAX Project: www.saxproject.org
SIAG, O3read, o3totxt, o3tohtml: siag.nu
soffice2html: hoopajoo.net/projects/soffice2html.html
Stop Word Attachments: www.gnu.org/philosophy/no-word-attachments.html
TclXML: tclxml.sourceforge.net
Writer2LaTeX: www.hj-gym.dk/~hj/writer2latex
Xerces: xml.apache.org/xerces-j
XMerge: xml.openoffice.org/xmerge
XMLsec: www.aleksey.com/xmlsec
Marco Fioretti is a hardware systems engineer interested in free software both as an EDA platform and, as the current leader of the RULE Project, as an efficient desktop. Marco lives with his family in Rome, Italy.
Articles about Digital Rights and more at http://stop.zona-m.net CV, talks and bio at http://mfioretti.com
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- RSS Feeds
- Technical Support Rep
- Non-Linux FOSS: libnotify, OS X Style
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- So when they found it hard to
43 min 19 sec ago - yea
1 hour 5 min ago - Reply to comment | Linux Journal
1 hour 27 min ago - Android has been dominating
1 hour 32 min ago - It is quiet helping
4 hours 18 min ago - Technology
4 hours 35 min ago - Reachli - Amplifying your
5 hours 51 min ago - excellent
6 hours 40 min ago - good point!
6 hours 43 min ago - Varnish works!
6 hours 52 min ago




Comments
just a dream
I hope some day it won't matter if I work on my text with KWord, AbiWord, OpenOffice Writer or ...... Micrsoft Word