Handling CSV Files in Python
As a buddy of mine always says "the nice thing about standards is that there's so many to choose from". Take CSV files for example. CSV, of course, stands for "Comma Separated Values", more often than not though, it seems that CSV files use tabs to separate values rather than commas. And let's not even mention field quoting. If you deal with CSV files and you use Python the csv module can make your life a bit easier.
Dealing with CSV files in Python probably couldn't be much easier. For example purposes, let's use the following CSV file that contains 3 columns "A", "B", and "C D":
$ cat test.csv A,B,"C D" 1,2,"3 4" 5,6,7
The following python program reads it and displays its contents:
import csv ifile = open('test.csv', "rb") reader = csv.reader(ifile) rownum = 0 for row in reader: # Save header row. if rownum == 0: header = row else: colnum = 0 for col in row: print '%-8s: %s' % (header[colnum], col) colnum += 1 rownum += 1 ifile.close()
When run it produces:
$ python csv1.py A : 1 B : 2 C D : 3 4 A : 5 B : 6 C D : 7
In addition, the csv module provides writer objects for writing CSV files. The following Python program converts our test CSV file to a CSV file that uses tabs as a value separator and that has all values quoted. The delimiter character and the quote character, as well as how/when to quote, are specifed when the writer is created. These same options are available when creating reader objects.
import csv ifile = open('test.csv', "rb") reader = csv.reader(ifile) ofile = open('ttest.csv', "wb") writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL) for row in reader: writer.writerow(row) ifile.close() ofile.close()
Running it produces:
$ python csv2.py $ cat ttest.csv "A" "B" "C D" "1" "2" "3 4" "5" "6" "7"
My first task when starting to use the csv module was to write a function to try to determine what format the CSV file was in before opening it so that I could deal with commas and tabs and different quoting conventions:
import os import sys import csv def opencsv(filename): tfile = open(filename, "r") line = tfile.readline() tfile.close() if line == '"': quote_char = '"' quote_opt = csv.QUOTE_ALL elif line == "'": quote_char = "'" quote_opt = csv.QUOTE_ALL else: quote_char = '"' quote_opt = csv.QUOTE_MINIMAL if line.find('\t') != -1: delim_char = '\t' else: delim_char = ',' tfile = open(filename, "rb") reader = csv.reader(tfile, delimiter=delim_char, quotechar=quote_char, quoting=quote_opt) return (tfile, reader)
Being new to the csv module and making the common mistake of not reading the whole "man" page, I of course failed to notice that the csv module already contains something to do this called the Sniffer class. I'll leave using it as an exercise for the reader (and in this case the writer also).
Mitch Frazier is an Associate Editor for Linux Journal.
Webinar: 8 Signs You’re Beyond Cron
11am CDT, April 29th
Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.Join us!
|Play for Me, Jarvis||Apr 16, 2015|
|Drupageddon: SQL Injection, Database Abstraction and Hundreds of Thousands of Web Sites||Apr 15, 2015|
|Non-Linux FOSS: .NET?||Apr 13, 2015|
|Designing Foils with XFLR5||Apr 08, 2015|
|diff -u: What's New in Kernel Development||Apr 07, 2015|
- Drupageddon: SQL Injection, Database Abstraction and Hundreds of Thousands of Web Sites
- Play for Me, Jarvis
- Non-Linux FOSS: .NET?
- Designing Foils with XFLR5
- Not So Dynamic Updates
- Flexible Access Control with Squid Proxy
- New Products
- diff -u: What's New in Kernel Development
- Users, Permissions and Multitenant Sites