OpenOffice.org in the Limelight
OpenOffice.org is a great set of software, consisting of several useful components that offer a lot of options. It is customizable and introduces many open formats for documents. In order to adapt the basic configurations to your particular needs, OpenOffice.org allows you to prepare macros and additional scripts.
I work as an editor at a Polish free software magazine. At the beginning of the editorial process, the author supplies the text and the editor edits it. Editing means removing common content-related and formal mistakes or errors, as well as preparing the text in a standard form to make it easier to process at further stages. The proofreader then corrects the text and the editor looks through it again and makes the final changes. Finally, the typesetter prepares the text for printing, and the editor checks the entire work one last time.
The processed text is in a different format at each stage of this process. Our publishing house prefers open formats for documents, so our authors deliver the documents in text or HTML formats and the graphics in PNG or EPS formats. After editing the document, the editor sends a copy to the author—that copy is in HTML. Our proofreaders work on Microsoft Windows systems and use Microsoft Word, so they need the documents to be in the .doc file format. Our typesetters work on Macintosh systems and use QuarkXPress. They need two kind of documents: Microsoft Word files for printing and checking the required formats for the article and Macintosh text files for opening the files in Quark and processing them.
When our quarterly started in autumn 2000, I was using StarOffice. Since then, I switched to OpenOffice.org. The methods to work with authors' text files are similar for StarOffice and OpenOffice.org. I import the document in text or HTML format using StarWriter (previously) or OpenOffice.org Writer (at present), and—after processing it—I export it to HTML, Microsoft Word or the corresponding SDW or SXW file formats.
If a source file is prepared well, there should be no problems when importing it. If a file is damaged, it must be repaired. This is not difficult to do if you take into account the open formats of the documents.
Once a file is imported, you need to change it to the proper format. The editors of Polish, German, French or other non-English language publications should change the codepage as well. A standard codepage for Polish documents, for example, is ISO-8859-2, and the standard codepage for all OpenOffice.org documents is UTF-8. To convert imported documents in a convenient way, you need a macro. The macros I've built for OpenOffice.org consist of several codepage converters, including converters from ISO-8859-2 to UTF-8 and vice versa.
Paragraphs in text files written in some text editors may be broken into a number of lines. To consolidate them, you need to use the KillparZ macro, which is an improved version of the killpars macro by Andrew Brown (Figure 1). KillparZ is a component of the ooo-macro bundle.
Assuming the author of the document declared the appropriate charset, there shouldn't be a problem with the codepage when you import an HTML file. But another problem may arise—the shortcuts associated with your macros stop working in HTML documents. To make macros work, you need to create an empty OpenOffice.org Writer document, open the HTML file, copy it, close the HTML file and, finally, paste the content into the Writer document.
Our magazine is published in Polish, so I need to use more sophisticated methods when exporting files. Specifically, I need to use fonts with Polish diacritics. My tests of StarWriter and OpenOffice.org Writer have shown that if you want to avoid problems related to codepages in non-English language documents, you should use TrueType fonts instead of Type1 fonts. Moreover, you obtain the best effects of exporting documents to the Microsoft Word format if you use the same fonts as are used in Microsoft Windows. The Microsoft fonts, bundled in Microsoft FontPack, including Times New Roman, Arial and Courier New, are sufficient in most cases.
The authors of StarOffice and OpenOffice.org had to use some reverse engineering to discover how the Microsoft Word format is constructed. As a result, the export filter from Writer to Word works well but not perfectly. Therefore, if you want to exchange standard document types with other users, prepare one typical document using all the necessary formatting, including headers, italics and boldface. Then make the sample available to coworkers and ask them if everything works well.
The articles we publish are a simple kind of document. Our editorial office uses the three above-mentioned fonts, as well as italic and bold, two levels of headers and straight tables. We do not include the graphics in our documents; we simply list the names of the files in PNG or EPS format. Such documents can be exported from SDW or SXW formats to Microsoft Word without any problems.

Figure 2. An HTML file as exported by OpenOffice.org—it uses styles, classes and a lot of other unwanted formatting.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- New Products
- Weechat, Irssi's Little Brother
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
- Poul-Henning Kamp: welcome to
20 min 11 sec ago - This has already been done
21 min 11 sec ago - Reply to comment | Linux Journal
1 hour 6 min ago - Welcome to 1998
1 hour 54 min ago - notifier shortcomings
2 hours 18 min ago - heroku?
3 hours 55 min ago - Android User
3 hours 57 min ago - Reply to comment | Linux Journal
5 hours 50 min ago - compiling
8 hours 39 min ago - This is a good post. This
13 hours 52 min ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?







Comments
Are the Perl scripts available?
Interesting reading - I was particularly interested in the way that 'clean' HTML can be obtained from the raw HTML output of Ooo writer. Are the Perl scripts available?
TIA
H.
It looks like these perl scri
It looks like these perl scripts manipulate tags using regular expressions. There are too many edge cases for this to be a good idea. Much better would be to properly parse the HTML, tidy it and walk the DOM tree to remove unwanted elements. The HTML section at CPAN looks like a good place to start doing that. Mind you, there are also interfaces to manipulate OOo documents directly. Good luck, happy hacking!