Convert SpreadSheets to CSV files with Python and pyuno
January 6th, 2009 by Mitch Frazier in
Using the OORunner class that we developed last week we'll now create a Python class for converting spreadsheets into CSV files. The converter supports any type of input spreadsheet that is supported by OpenOffice.
When run the program takes pairs of input and output files, for example:
$ python ssconverter.py file1.xls file1.csv file2.ods file2.csv
Each input file is a spreadsheet and it is converted into the corresponding output file as a CSV file.
The meat of the operation happens in the convert function from the SSConverter class. The first thing it does is start OpenOffice running using the OORunner class. It then converts the input and output file names to URLs and uses the desktop object returned by OORunner to create and load a document object. Converting the spreadsheet to a CSV file is merely a matter of saving the document to the output URL. The source code for the SSConverter class follows:
#!/usr/bin/python
#
# Convert spreadsheet to CSV file.
#
# Based on:
# PyODConverter (Python OpenDocument Converter) v1.0.0 - 2008-05-05
# Copyright (C) 2008 Mirko Nasato <mirko@artofsolving.com>
# Licensed under the GNU LGPL v2.1 - or any later version.
# http://www.gnu.org/licenses/lgpl-2.1.html
#
import os
import ooutils
import uno
from com.sun.star.task import ErrorCodeIOException
class SSConverter:
"""
Spreadsheet converter class.
Converts spreadsheets to CSV files.
"""
def __init__(self, oorunner=None):
self.desktop = None
self.oorunner = None
def convert(self, inputFile, outputFile):
"""
Convert the input file (a spreadsheet) to a CSV file.
"""
# Start openoffice if needed.
if not self.desktop:
if not self.oorunner:
self.oorunner = ooutils.OORunner()
self.desktop = self.oorunner.connect()
inputUrl = uno.systemPathToFileUrl(os.path.abspath(inputFile))
outputUrl = uno.systemPathToFileUrl(os.path.abspath(outputFile))
document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties(Hidden=True))
try:
# Additional property option:
# FilterOptions="59,34,0,1"
# 59 - Field separator (semicolon), this is the ascii value.
# 34 - Text delimiter (double quote), this is the ascii value.
# 0 - Character set (system).
# 1 - First line number to export.
#
# For more information see:
# http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options
#
document.storeToURL(outputUrl, ooutils.oo_properties(FilterName="Text - txt - csv (StarCalc)"))
finally:
document.close(True)
if __name__ == "__main__":
from sys import argv
from os.path import isfile
if len(argv) == 2 and argv[1] == '--shutdown':
ooutils.oo_shutdown_if_running()
else:
if len(argv) < 3 or len(argv) % 2 != 1:
print "USAGE:"
print " python %s INPUT-FILE OUTPUT-FILE INPUT-FILE OUTPUT-FILE..." % argv[0]
print "OR"
print " python %s --shutdown" % argv[0]
exit(255)
if not isfile(argv[1]):
print "File not found: %s" % argv[1]
exit(1)
try:
i = 1
converter = SSConverter()
while i+1 < len(argv):
print '%s => %s' % (argv[i], argv[i+1])
converter.convert(argv[i], argv[i+1])
i += 2
except ErrorCodeIOException, exception:
print "ERROR! ErrorCodeIOException %d" % exception.ErrCode
exit(1)
As with OORunner, this code is based on PyODConverter. Next week we'll write a converter function that creates the CSV file automatically from the corresponding spreadsheet if the CSV file does not exist. In addition it will re-create the CSV file if the spreadsheet is newer than the CSV file. This way you can essentially use spreadsheets and CSV files interchangeably in your code.
__________________________Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer
Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.
Subscribe now!
The Latest
Newsletter
Tech Tip Videos
- Jul-13-09
- Jul-09-09
Recently Popular
From the Magazine
August 2009, #184
If you're a culinary type you've probably heard of Pickled Capers. This month, we present you with an even tastier treat: Kerneled Kapers. That's right Linux so good that you can eat it for dinner. We've got two articles about kernel scheduling: one about real time scheduling and the other about the Completely Fair Scheduler which appeared in Linux 2.6.23. We also have an article on the new Ksplice technology that appeared on the scene just recently. Also in this issue: find out how to make root unprivileged.
And if Kapers aren't your cup of tea we have our usual buffet of articles: eyeOS which allows you to create your own cloud based desktops, using fixtures and factories with Rails, more on secure Squids, a review of the long awaited KOffice 2.0, Longomatch, and Kanatest.
But don't leave before we serve up the "piece de resistance": Point/Counterpoint on Twitter.
Apologies to Chef Marcel for borrowing his shtick.
Delicious
Digg
StumbleUpon
Reddit
Facebook








sheet selection
On January 23rd, 2009 Marcello (not verified) says:
I find very useful your article.
I have a specific question:
What if I need to save a specific sheet (the second for istance..) ?
Again
On January 25th, 2009 Mitch Frazier says:
Ask that again, I'm not following what you want to do.
__________________________Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
Great article
On January 8th, 2009 jza says:
This is really exciting, keep this columns coming. Python is such a great language. For more information please write on the Python wikipage within OpenOffice.org:
http://wiki.services.openoffice.org/wiki/Python
__________________________Alexandro Colorado
Post new comment