Handling CSV Files in Python
As a buddy of mine always says "the nice thing about standards is that there's so many to choose from". Take CSV files for example. CSV, of course, stands for "Comma Separated Values", more often than not though, it seems that CSV files use tabs to separate values rather than commas. And let's not even mention field quoting. If you deal with CSV files and you use Python the csv module can make your life a bit easier.
<!-- break -->Dealing with CSV files in Python probably couldn't be much easier. For example purposes, let's use the following CSV file that contains 3 columns "A", "B", and "C D":
$ cat test.csv A,B,"C D" 1,2,"3 4" 5,6,7
The following python program reads it and displays its contents:
<!-- csv1.py -->import csv
ifile = open('test.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()
When run it produces:
$ python csv1.py A : 1 B : 2 C D : 3 4 A : 5 B : 6 C D : 7
In addition, the csv module provides writer objects for writing CSV files. The following Python program converts our test CSV file to a CSV file that uses tabs as a value separator and that has all values quoted. The delimiter character and the quote character, as well as how/when to quote, are specifed when the writer is created. These same options are available when creating reader objects.
<!-- csv2.py -->import csv
ifile = open('test.csv', "rb")
reader = csv.reader(ifile)
ofile = open('ttest.csv', "wb")
writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
for row in reader:
writer.writerow(row)
ifile.close()
ofile.close()
Running it produces:
$ python csv2.py $ cat ttest.csv "A" "B" "C D" "1" "2" "3 4" "5" "6" "7"
My first task when starting to use the csv module was to write a function to try to determine what format the CSV file was in before opening it so that I could deal with commas and tabs and different quoting conventions:
<!-- opencsv.py -->import os
import sys
import csv
def opencsv(filename):
tfile = open(filename, "r")
line = tfile.readline()
tfile.close()
if line[0] == '"':
quote_char = '"'
quote_opt = csv.QUOTE_ALL
elif line[0] == "'":
quote_char = "'"
quote_opt = csv.QUOTE_ALL
else:
quote_char = '"'
quote_opt = csv.QUOTE_MINIMAL
if line.find('\t') != -1:
delim_char = '\t'
else:
delim_char = ','
tfile = open(filename, "rb")
reader = csv.reader(tfile, delimiter=delim_char, quotechar=quote_char, quoting=quote_opt)
return (tfile, reader)
Being new to the csv module and making the common mistake of not reading the whole "man" page, I of course failed to notice that the csv module already contains something to do this called the Sniffer class. I'll leave using it as an exercise for the reader (and in this case the writer also).
Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.










This week 5 lucky Members will receive a copy of The Official Ubuntu Server Book by Benjamin Mako Hill and Linux Journal's very own Kyle Rankin. No entry necessary. Check back here early next week to find out who the lucky Online Members are.




Comments
How would you select clumns to put in new file
I came accross this code for perl:
http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Choose.html
perl -e " @cols=(1, -1, 2); while(<>) { s/\r?\n//; @F=split /\t/, $_; print join( qq~\t~, @F[@cols]), qq~\n~ } warn qq~\nChose columns ~, join( qq~, ~, @cols), qq~ for $. lines\n\n~ " all_cols > some_cols_chosen
and I am trying to get a python code that does the same and will allow me to write say only the first 13 columns from a csv file to a new file.
Thanks
Shawn
more info
Hi,
How about if i want to output it becoming...
A : B
A : C D
1 : 2
1 : 3 4
5 : 6
5 : 7
Thanks
Try this
This should do it:
Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
Nice Post
Very nice. I use a lot of CSV files for data analysis using R. I hope to become more involved in building and standardizing options for doing data analysis in Python as Python opens up the world of 'real' programming languages: eventually using R's functionality with webapps/gtk/etc.
CSV Files to HTML
Hi !
I recently wrote a quick script to convert CSV files of the form: foo, bar, blah, blah into nice HTML displays. Here it is:
#!/usr/bin/python # -*- coding: utf-8 -*- # CSV to HTML Converter # Amit K. Saha import csv import sys if len(sys.argv) < 2: print "Usage: ./csv-html.py " print print exit(0) # Open the CSV file for reading reader = csv.reader(open(sys.argv[1])) # Create the HTML file f_html = open(sys.argv[2],"w"); f_html.write('') for row in reader: # Read a single row from the CSV file f_html.write('');# Create a new row in the table for column in row: # For each column.. f_html.write('' + column + ''); f_html.write('') f_html.write('')Hope it is useful!
Making the first example more Pythonic
That first code example reads a bit more cleanly if you use some standard Pythonisms (
zip()and using the reader as an iterator, callingnext()to get the header)import csv
ifile = open("test.csv", "rb")
reader = csv.reader(ifile)
headers = reader.next()
for row in reader:
for header, col in zip(headers,row):
print "%-8s: %s" % (header, col)
ifile.close()
-gumnos
Thanks
Actually, I'm fairly new to Python programming so I can use all the tips you got. For others not familiar with the Python builtin function zip(), this is from the docs:
In other words, for the example here, it pairs each column with the corresponding header. Using the first data row from the test file, zip(headers,row) would return:
[('A', '1'), ('B', '2'), ('C D', '3 4')]Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.
Post new comment