The OASIS Standard for Office Documents: How All Users and Developers Can Benefit
Desktop integration begins with documents, not with any toolkit or bundle of applications. If files can be read and written by every application, users can communicate, work together and become integrated. In this sense, the OASIS XML format for office documents has the potential to be one of the most meaningful advances in free computing.
OASIS stands for Organization for the Advancement of Structured Information Standards. Formerly SGML Open, this nonprofit consortium, which includes such companies as IBM, Sun and Boeing, aims to create open standards for almost any kind of structured information. The one we cover here is an XML-based format common to all kinds of office files—text, spreadsheets, presentations and more.
The significance of an effort of this caliber to promote a file format, rather than any specific desktop, application or the Linux kernel itself, cannot be underestimated. Free as in free formats is even more important than free software. Only with them and the internal structuring that comes from XML can data be exchanged, with new or different programs without any need for converters, or be directly edited, indexed, analyzed and exchanged between heterogeneous groups or servers—like Web services without the hype. Data will start belonging exclusively to end users.

Figure 1. Switching to OASIS and never going back: OpenOffice.org can convert all your closed-format documents to the new standard with only a few clicks.
The OASIS Office Technical Committee had its first meeting in February 2003. The official file format should be voted on in February 2004. After the approval, Phase 2 will start; its main goal will be to extend the base specification to additional areas of application. The real goal is the move to a document-centric model, independent from and available to any given program, regardless of its license. The Technical Committee is determined to quit with the assumption that every file spec must be application-bound, as today.
Some farsighted public administrations already have started to think in this way. The Swedish Agency for Public Management says, “[We] should also follow and if possible support work that takes place in OASIS....An open file format for office software is of great importance for increased interoperability” (www.openoffice.org/servlets/ReadMsg?msgId=585772&listName=discuss). At the European Union level, IDA (Interchange of Data between Administrations) decided in 2003 to carry out exploratory work on open document formats and on how public administrations could persuade software vendors to support them.
The standard conforms to general W3C specifications for XML technologies and covers every aspect of document usages. User interaction, for example, is described in XML schema templates, which operate like traditional API functions. Even they, however, now are independent of any single application.
A text format can be much bigger and more inefficient than an equally free but binary one. Even when the performance hit would be noticeable, however, the benefits simply are too great to give up. In itself, an OASIS office file (be it text, presentation or spreadsheet) is a zip archive: the compression format chosen is a compromise of efficiency, speed of accessing internal parts and algorithm license. Unzipping it, we first find five XML files: styles.xml, presentation and formatting; contents.xml, actual contents; settings.xml, application settings such as zoom level and printer; meta.xml, language and uncoding metadata; and manifest.xml, an explanation of what all the other files are and their relative paths.
Other components (each in a predefined folder, so that even virus scanners have an easier time) may be macros, their dialogs and objects, such as charts or formulas.
Because the standard imposes that all pieces must be present in the zip archive, no information is lost: content, layout and everything else always travel together. Unlike some proprietary offerings in the same space, there is no restriction on which application must be employed to make full use of a document. WYSIWYG results are possible and can be specified fully or replaced in the styles.xml file. At the same time, however, content and presentation are decoupled; hence, content and nothing else is attainable by any application, for any conceivable use. kfile-plugin-ooo, for example, extracts all the metadata embedded in the new file format. The end user then can read, search by metadata or modify all this information straight from KOffice or Konqueror. This plugin also is included in the latest KOffice source trees.
Text format and internal structure make decades of UNIX experience in processing and generating text come back with a vengeance to tame complex, WYSIWYG office documents of every kind. Shell one-liners, Web spiders and so on can query and process directly, much like a database engine, single documents or whole classes of them. Viewing attached presentations as text in mutt or industry-level content management systems becomes easier. As a proof of concept, I was able to get the (admittedly rough) outline of Listing 1 from a presentation simply by typing:
# tr "<" "\012" < content.xml | grep ^text \ | cut '-d>' -f2, | uniq
Articles about Digital Rights and more at http://stop.zona-m.net CV, talks and bio at http://mfioretti.com
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
| Android's Limits | Jun 04, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Introduction to MapReduce with Hadoop on Linux
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Weechat, Irssi's Little Brother
- One Tail Just Isn't Enough
- Android's Limits
- Reply to comment | Linux Journal
3 min 33 sec ago - Reply to comment | Linux Journal
4 min ago - Replica Watches
2 hours 28 min ago - Reply to comment | Linux Journal
6 hours 39 min ago - on the path to understanding
6 hours 43 min ago - As a fisher,we know that a
1 day 2 hours ago - All I Say Is Worth Share!
1 day 3 hours ago - GeekSays
1 day 3 hours ago - thanks
1 day 6 hours ago - You should consider visiting
1 day 7 hours ago




Comments
just a dream
I hope some day it won't matter if I work on my text with KWord, AbiWord, OpenOffice Writer or ...... Micrsoft Word