Converting Office Documents

Now and then, office-type documents need to be converted. The latex users have always been able to produce a variety of formats from the command line, but for the OpenOffice/LibreOffice users, manual labor has been the solution. That changes with unoconv. Now you can convert to most file formats directly from the command line.

Unoconv is handy for many tasks. I commonly use it to convert all documents in a directory to PDFs, or MS Office compatible formats for clients. The beauty of it is that these previously tedious tasks are now one-liners.

If you're on ubuntu or derivates (I'm on kubuntu) you can install unoconv from the command prompt:

$ sudo apt-get install unoconv

Having done that, you need to start the server half of unoconv.

$ unoconv --listener

Give this a few seconds to settle. It starts an instance of OpenOffice in the background which it ties into. To use this instance of OpenOffice for format conversion, now try the following:

$ unoconv -f pdf *.odp *.odt

This will convert all text documents and presentations to pdfs. There isn't much control in the process, but if you want the standard output, it is a great help.

When it comes to exporting to MS Office formats, you have slightly more control. You can, for instance, target the format doc, doc6 and doc95, meaning Word 97/2000/XP, Word 6.0 and Word 95 respectively.

The project is alive, so there is good hope to have the final glitches sorted out. The tool spits out a couple of scary warnings now and then, but the documents seem to turn out well.

The conversion is based entirely on OpenOffice's conversion, so the quality is what you know from there. Since the conversion is automatic, you might have to limit yourself at times. For instance, I've learned that not using the arrow connectors, but instead relying on lines, helps in odp to ppt conversion. Also, the ppts produced are not compatible with the very latest MS Office on Mac - but then you can create pdfs just as easy.

______________________

Johan Thelin is a consultant working with Qt, embedded and free
software. On-line, he is known as e8johan.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Don't get it working

dellhanks30's picture

I don't get it working. Im on Ubuntu and used the following lines:

$ sudo apt-get install unoconv

$ unoconv --listener

But somehow my server is tripping after those comments...

Can someone give me some tips? Or maby another program? Is Fedora usefull aswell?

Thanks alot! Marcus Faul

The unoconv --listener server

Johan Thelin's picture

The unoconv --listener server program starts an instance of LibreOffice / OpenOffice, actually, soffice (the original name was StarOffice). Make sure to have exited all your office sessions, also use ps ax | grep office to ensure that all soffice binaries are killed before you run the unoconv --listener command.

Johan Thelin is a consultant working with Qt, embedded and free
software. On-line, he is known as e8johan.

cloud?

jonredwhiteblue's picture

My query may be tangential by nature, but I'm wondering what the future of documents looks like on the Linux system. I have an abundance of documents to convert and then send off via external message delivery. I am seeking assistance on both ends of this task. I have received some feedback about potential solutions for the delivery via cursory research online(http://www.port25.com/ and a few others came up in my initial search). But I'm a ways off from deciding, so plenty of time to research the best fit. The problem is, I have no idea where to start with the conversion of the documents.

I have been infatuated with Linux and it's sleek, programmer friendly interface for a long time. Admittedly, I have not spent the requisite amount of time learning OS to feel comfortable doing this myself.

I'm not sure if I'm lazy, or inundated by the constant chatter I hear about storage solutions such as "the cloud", that I think this should be easier said then done.

All that being said, I appreciate the tutorial. At least I know of one solution that works. Time will tell if I can actually implement it for my own situation!

We need a tool for MASS-CONVERTING documents

Anonymous's picture

The communnity badly needs a tool for mass-converting Microsoft Office documents to OpenDocument, with as little loss in functionality as possible. Problems occur with documents containing macros, special formatting, etc.

I ever dreamt of a Postfix plugin that I'd install on my mailserver, that will convert .doc attachements to .odt, on the fly...

How about shell-scripting it?

Sum Yung Gai's picture

One way I could see to do that mass-conversion would be to do something like this, using odt as the "destination".

while read filename
do
unoconv -f odt $filename
done < ls -C1 -R | grep -E "\.doc$|\.docx$"

You could do the same thing with .xls/.xlsx, .ppt/.pptx, and so on.

Script for conversion

Anonymous's picture

Thanks for the idea, but that's for programmers only. And we need a more refined script: there are documents that do not convert well, documents with passwords, documents with macros, etc. We need to treat that exceptions.

The „Conversion Assistant” in LibreOffice's File menu is a good starting point.

My „dream” is to have a Postfix plugin that will do the following: when a mail with a .doc attchement arives, it converts the attachement to .odt (and optionally reply to the sender "Please don't send documents in proprietary formats") :)

How about HTML

ronnn's picture

Is HTML one of the supported formats?

I recently had a user question about the best way of converting DOCs to HTML.

SAVE AS HTML is 'orrible
SAVE AS HTML FILTERED is somewhat better.
Is there anything better than those 2 options?

why not use other format.

swissman's picture

if you think difficult to convert doc's to html, why not try other format. i don't usually use doc as base to create html page.it tend change the code i put.

cheers!
swiss watch brands

I have a program on my Fedora

Anonymous's picture

I have a program on my Fedora computer called pdftohtml. I guess you could use unoconv to convert a doc file to pdf, and then convert that to html using pdftohtml.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState