Convert SpreadSheets to CSV files with Python and pyuno

 in

Using the OORunner class that we developed last week we'll now create a Python class for converting spreadsheets into CSV files. The converter supports any type of input spreadsheet that is supported by OpenOffice.

When run the program takes pairs of input and output files, for example:

$ python ssconverter.py file1.xls file1.csv file2.ods file2.csv

Each input file is a spreadsheet and it is converted into the corresponding output file as a CSV file.

The meat of the operation happens in the convert function from the SSConverter class. The first thing it does is start OpenOffice running using the OORunner class. It then converts the input and output file names to URLs and uses the desktop object returned by OORunner to create and load a document object. Converting the spreadsheet to a CSV file is merely a matter of saving the document to the output URL. The source code for the SSConverter class follows:

#!/usr/bin/python
#
# Convert spreadsheet to CSV file.
#
# Based on:
#   PyODConverter (Python OpenDocument Converter) v1.0.0 - 2008-05-05
#   Copyright (C) 2008 Mirko Nasato <mirko@artofsolving.com>
#   Licensed under the GNU LGPL v2.1 - or any later version.
#   http://www.gnu.org/licenses/lgpl-2.1.html
#

import os
import ooutils

import uno
from com.sun.star.task import ErrorCodeIOException



class SSConverter:
    """
    Spreadsheet converter class.
    Converts spreadsheets to CSV files.
    """

    def __init__(self, oorunner=None):
        self.desktop  = None
        self.oorunner = None


    def convert(self, inputFile, outputFile):
        """
        Convert the input file (a spreadsheet) to a CSV file.
        """

        # Start openoffice if needed.
        if not self.desktop:
            if not self.oorunner:
                self.oorunner = ooutils.OORunner()

            self.desktop = self.oorunner.connect()

        inputUrl  = uno.systemPathToFileUrl(os.path.abspath(inputFile))
        outputUrl = uno.systemPathToFileUrl(os.path.abspath(outputFile))
        document  = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties(Hidden=True))

        try:
            # Additional property option:
            #   FilterOptions="59,34,0,1"
            #     59 - Field separator (semicolon), this is the ascii value.
            #     34 - Text delimiter (double quote), this is the ascii value.
            #      0 - Character set (system).
            #      1 - First line number to export.
            #
            # For more information see:
            #   http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options
            #
            document.storeToURL(outputUrl, ooutils.oo_properties(FilterName="Text - txt - csv (StarCalc)"))
        finally:
            document.close(True)


if __name__ == "__main__":
    from sys import argv
    from os.path import isfile

    if len(argv) == 2  and  argv[1] == '--shutdown':
        ooutils.oo_shutdown_if_running()
    else:
        if len(argv) < 3  or  len(argv) % 2 != 1:
            print "USAGE:"
            print "  python %s INPUT-FILE OUTPUT-FILE INPUT-FILE OUTPUT-FILE..." % argv[0]
            print "OR"
            print "  python %s --shutdown" % argv[0]
            exit(255)
        if not isfile(argv[1]):
            print "File not found: %s" % argv[1]
            exit(1)

        try:
            i = 1
            converter = SSConverter()

            while i+1 < len(argv):
                print '%s => %s' % (argv[i], argv[i+1])
                converter.convert(argv[i], argv[i+1])
                i += 2

        except ErrorCodeIOException, exception:
            print "ERROR! ErrorCodeIOException %d" % exception.ErrCode
            exit(1)

As with OORunner, this code is based on PyODConverter. Next week we'll write a converter function that creates the CSV file automatically from the corresponding spreadsheet if the CSV file does not exist. In addition it will re-create the CSV file if the spreadsheet is newer than the CSV file. This way you can essentially use spreadsheets and CSV files interchangeably in your code.

______________________

Mitch Frazier is an Associate Editor for Linux Journal.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Great article

jza's picture

This is really exciting, keep this columns coming. Python is such a great language. For more information please write on the Python wikipage within OpenOffice.org:

http://wiki.services.openoffice.org/wiki/Python

Linuxer, Rapper, and part time lunatic
Living in the sandy beaches of Cancun

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState