The OASIS Standard for Office Documents: How All Users and Developers Can Benefit
Listing 1. Extracting the Text of an OASIS Presentation at the Command Line
Problems with a lot of the bang up to date mainstream Free SW Requires modern HW: plenty of RAM fast CPUs big Hard disk drives Unlike SW, modern HW cannot be "free as free beer" Doesn't this sound familiar?
Encryption obviously is supported, and any paragraph can have an identity attribute. Through this feature, different users can be granted access to different parts of the same document based on their privileges. The default text encoding is UTF-8, even if other ones can be chosen. Suggestions to improve the standard can be posted to firstname.lastname@example.org.
This design and implementations are all well and good, but users need some application to use it. What is available? Absolutely complete compatibility is possible only with software designed from scratch or with software that has been modified thoroughly to achieve it. In general, existing programs and their developers may have to compromise between the standard and their current concept of the perfect structure of the perfect document. For example, margins are a section or page property in some applications and a paragraph property in others.
This said, the users of OpenOffice.org will have the easiest time; the OASIS standard is built on and almost will be equal to the current OOo formats. AbiWord has both import and less-advanced export filters, but they are not 100% complete. Contributions to improve them are extremely welcome. This program also offers end users the option to use OOo as the default file format. The plan for KOffice, after improving the filters for version 1.3, is to start the switch to OASIS as the native format of future releases. David Faure, one of the chief KOffice developers, also is a member of the Technical Committee, and he foresees no real obstacles to a complete support of the standard, in spite of the frame-oriented rather than page-oriented paradigm used in KOffice.
SIAG offers some support for reading text and spreadsheet files through external applications, but nothing is available for writing. Emacs surely will come up with its own OASIS mode sooner or later, and WordPerfect also is officially represented in the Technical Committee. In short, things look good. Choice already is offered, and the only things left are to set OASIS as the default save format and to refuse to receive or send files in proprietary formats.
A lot of code already is available to study and reuse for processing the OASIS file format. Whatever you choose, don't forget the standard itself and the main point—format and applications shall remain separated. If you want to improve the first, submit proposals as explained above. If you want faster or more featureful code, do it yourself or help the developer(s) of the corresponding application without touching the format or inventing a new one.
Several standalone filters already are available to move back and forth between OASIS/OOo files (or XML in general) and other formats. The utilities RTF2XML, ooo2txt, SIAG, O3read, o3totxt, o3tohtml, OOo2sDbk, Writer2LaTeX and soffice2html (see Resources) cover together RTF, (X)HTML, LaTeX, DocBook and, of course, plain text.
CPAN hosts several Perl modules useful for OASIS-related processing. OpenOffice::Parse::SXC parses OOo spreadsheets, making the text value of each cell (but nothing else) available for the main script. It comes with a utility to convert OOo spreadsheets to CSV format. Another Perl module, XML::Excel can transform Excel spreadsheets into plain XML, dumping them into an intermediate structure for custom processing, if necessary. On the server side, Apache::AxKit::Provider::OpenOffice extracts the content of text (.sxw) files.
Tcl has linters, DOMs and XSLT interfaces, as well as an API that allows switching to different parsers with no changes to the application code. When nothing else is available, a native Tcl parser is used; otherwise the developer can take advantage of both Expat and Libxml (see below).
PDA developers have a dedicated project, related to OASIS quite directly, called XMerge that currently is developed in Java for Palm and Pocket PC. Its purpose is to allow the editing of OOo documents (maybe previously converted to a more limited format) with PDA native applications, in such a way that any changes can be merged back into the original format without loss of style, formatting and so on.
Articles about Digital Rights and more at http://stop.zona-m.net CV, talks and bio at http://mfioretti.com
- diff -u: What's New in Kernel Development
- Server Hardening
- 22 Years of Linux Journal on One DVD - Now Available
- Giving Silos Their Due
- What's New in 3D Printing, Part III: the Software
- Controversy at the Linux Foundation
- Don't Burn Your Android Yet
- Firefox OS
- February 2016 Issue of Linux Journal