OpenOffice.org: Sun PDF Import Extension
The Sun PDF Import Extension is one of the most popular OpenOffice.org extensions ever created. For the last two years, it has been near the top of the list of most popular downloads on the OpenOffice.org Extensions site -- and no wonder, considering that it is a free replacement for Adobe Acrobat, which is currently priced at $449US. However, the extension does have some quirks and limitations that you have to learn to work around.
The first quirk you have to overcome is obtaining it. To start with, you need to be running OpenOffice.org 3.0 or higher.
That is probably not a problem for most users, but finding a usable copy of the extension may be. When you click the Get it! button on the extensions site, the link takes you to a page about Oracle Open Office, the successor to Sun Microsystems's Star Office. This page mentions the PDF Import Extension, but provides no downloads.
To download the extension, you need to be alert when your browser switches to the page that thanks you for downloading, and choose a manual download before you can get the file.
Even then, to judge from the comments on the extensions page (and my own experience), you may have trouble using the extension after you install it from Tools -> Extension Manager. The easiest way to get the extension is to check your distribution's repository to see if it is included as a package, as in Debian.
You will know if you have succeeded in installing if you try to open an PDF file and it displays in Draw.
By contrast, if you get a few characters of gibberish, you need to keep searching for another way of getting the extension. You might be able to find an alternative download site with an earlier version that you can use. Don't worry if the version number is far below the 1.01 release mentioned on the extension page; the version numbers took a huge, unwarranted leap, and (so far as I can tell) a .4x version will not be much different in functionality from the 1.01 release.
Using the extension
Once you have the Sun PDF Import Extension installed, you need to know its limitations. Unfortunately, it's a mixture of good and bad news.
The good news is that the extension works extremely well with text, preserving all types of formatting including font size, bold, italics, strike-through and underlining. Fonts, too are preserved, although their names are not always parsed correctly and may have a few additional characters at the end of them. Should the fonts not be available on your system, the extension tries to replace them with a font whose characters are metrically equivalent. The positioning, too, of text, is maintained in all-text documents, so that a brochure that has text scattered over the page is imported as accurately as a white paper that is a solid block of paragraphs.
The extension places each line of text in a separate text frame. Each fragment of a line separate by a tab or spacing is also placed in a separate text frame. This arrangement means that you can easily correct typos, or add a few words if the line is short. Add much more, and you will throw off the line spacing in the document. You can, of course, add your text frames, but you will have to work carefully not to interfere with the line spacing or the bottom margin -- to say nothing of moving every line carefully downwards. Still, the effort may be worthwhile if you need to edit or recover an important document.
Another problem is that true Adobe forms and graphics are not imported at all. At the most, you will have only their frames, and, at times, especially with PNG graphics, the positioning of text will be thrown off by the missing elements. In these cases, if you want to include the forms or graphics included in a PDF made outside of OpenOffice.org, then you will have to capture them and insert them manually into the Draw document.
If you import a PDF created within OpenOffice.org, you may be able to import forms and graphics -- providing that you set the PDF to Hybrid format when you exported the file. A Hybrid PDF combines Acrobat and Open Document formats. A PDF reader like Adobe Acrobat that cannot parse Open Document Format will simply ignore it, but, when you come to import the file into OpenOffice.org for editing, the forms and graphics will be imported along with the text. The cost of using Hybrid format is that your files will be an average of about 20% larger, but that is a relatively small price to pay for the convenience of the kludge.
Finally, when you are finished editing, remember not to save the file, but to use File -> Export to PDF instead.
The extension's future
Sun PDF Import Extension is not the most elegant solution available, but it does an ingenious job of working with existing features, and obviously fills a need that many people have.
That is why free desktop users should be disturbed that Oracle, the new custodian of the extension, is not making it available for downloading. Not only is this situation a possible violation of the extension's licensing, but, although the source code is available, I have yet to hear of anyone stepping up to maintain the extension for the general community.
A real possibility exists that, a few versions from now, Sun PDF Import Extension will no longer be available. If that happens, one of OpenOffice.org's major advantages will be lost -- and that would be inexcusable.
Let's hope that someone -- either Oracle or some concerned coders -- keep Sun PDF Import Extension going. Despite its rough edges, it is considerably better than nothing.
Note: This is one of my last columns on OpenOffice.org. After six years of writing these columns, I am starting to scrape the bottom of the barrel. However, sometime in the next few months, I will be starting another series of columns about using major applications on the free desktop.
If you have any OpenOffice.org topic that I haven't covered, or any suggestions for which application to discuss first, please add your suggestion in a comment below.
-- Bruce Byfield (nanday)
- Integrating Trac, Jenkins and Cobbler—Customizing Linux Operating Systems for Organizational Needs
- New Products
- Tech Tip: Really Simple HTTP Server with Python
- Non-Linux FOSS: Remember Burning ISOs?
- EdgeRouter Lite
- RSS Feeds
- Returning Values from Bash Functions
- SUSE, MariaDB and IBM team up to tame Big Data