Convert SpreadSheets to CSV files with Python and pyuno
Using the OORunner class that we developed last week we'll now create a Python class for converting spreadsheets into CSV files. The converter supports any type of input spreadsheet that is supported by OpenOffice.
When run the program takes pairs of input and output files, for example:
$ python ssconverter.py file1.xls file1.csv file2.ods file2.csv
Each input file is a spreadsheet and it is converted into the corresponding output file as a CSV file.
The meat of the operation happens in the convert function from the SSConverter class. The first thing it does is start OpenOffice running using the OORunner class. It then converts the input and output file names to URLs and uses the desktop object returned by OORunner to create and load a document object. Converting the spreadsheet to a CSV file is merely a matter of saving the document to the output URL. The source code for the SSConverter class follows:
#!/usr/bin/python # # Convert spreadsheet to CSV file. # # Based on: # PyODConverter (Python OpenDocument Converter) v1.0.0 - 2008-05-05 # Copyright (C) 2008 Mirko Nasato <firstname.lastname@example.org> # Licensed under the GNU LGPL v2.1 - or any later version. # http://www.gnu.org/licenses/lgpl-2.1.html # import os import ooutils import uno from com.sun.star.task import ErrorCodeIOException class SSConverter: """ Spreadsheet converter class. Converts spreadsheets to CSV files. """ def __init__(self, oorunner=None): self.desktop = None self.oorunner = None def convert(self, inputFile, outputFile): """ Convert the input file (a spreadsheet) to a CSV file. """ # Start openoffice if needed. if not self.desktop: if not self.oorunner: self.oorunner = ooutils.OORunner() self.desktop = self.oorunner.connect() inputUrl = uno.systemPathToFileUrl(os.path.abspath(inputFile)) outputUrl = uno.systemPathToFileUrl(os.path.abspath(outputFile)) document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, ooutils.oo_properties(Hidden=True)) try: # Additional property option: # FilterOptions="59,34,0,1" # 59 - Field separator (semicolon), this is the ascii value. # 34 - Text delimiter (double quote), this is the ascii value. # 0 - Character set (system). # 1 - First line number to export. # # For more information see: # http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options # document.storeToURL(outputUrl, ooutils.oo_properties(FilterName="Text - txt - csv (StarCalc)")) finally: document.close(True) if __name__ == "__main__": from sys import argv from os.path import isfile if len(argv) == 2 and argv == '--shutdown': ooutils.oo_shutdown_if_running() else: if len(argv) < 3 or len(argv) % 2 != 1: print "USAGE:" print " python %s INPUT-FILE OUTPUT-FILE INPUT-FILE OUTPUT-FILE..." % argv print "OR" print " python %s --shutdown" % argv exit(255) if not isfile(argv): print "File not found: %s" % argv exit(1) try: i = 1 converter = SSConverter() while i+1 < len(argv): print '%s => %s' % (argv[i], argv[i+1]) converter.convert(argv[i], argv[i+1]) i += 2 except ErrorCodeIOException, exception: print "ERROR! ErrorCodeIOException %d" % exception.ErrCode exit(1)
As with OORunner, this code is based on PyODConverter. Next week we'll write a converter function that creates the CSV file automatically from the corresponding spreadsheet if the CSV file does not exist. In addition it will re-create the CSV file if the spreadsheet is newer than the CSV file. This way you can essentially use spreadsheets and CSV files interchangeably in your code.
Mitch Frazier is an Associate Editor for Linux Journal.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Google's SwiftShader Released
- Interview with Patrick Volkerding
- SUSE LLC's SUSE Manager
- Tech Tip: Really Simple HTTP Server with Python
- My +1 Sword of Productivity
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Returning Values from Bash Functions
- SuperTuxKart 0.9.2 Released
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide