Handling CSV Files in Python

December 16th, 2008 by Mitch Frazier in

Your rating: None Average: 4.5 (11 votes)

As a buddy of mine always says "the nice thing about standards is that there's so many to choose from". Take CSV files for example. CSV, of course, stands for "Comma Separated Values", more often than not though, it seems that CSV files use tabs to separate values rather than commas. And let's not even mention field quoting. If you deal with CSV files and you use Python the csv module can make your life a bit easier.

Dealing with CSV files in Python probably couldn't be much easier. For example purposes, let's use the following CSV file that contains 3 columns "A", "B", and "C D":

  $ cat test.csv
  A,B,"C D"
  1,2,"3 4"
  5,6,7

The following python program reads it and displays its contents:

import csv

ifile  = open('test.csv', "rb")
reader = csv.reader(ifile)

rownum = 0
for row in reader:
    # Save header row.
    if rownum == 0:
        header = row
    else:
        colnum = 0
        for col in row:
            print '%-8s: %s' % (header[colnum], col)
            colnum += 1
            
    rownum += 1

ifile.close()

When run it produces:

  $ python csv1.py 
  A       : 1
  B       : 2
  C D     : 3 4
  A       : 5
  B       : 6
  C D     : 7

In addition, the csv module provides writer objects for writing CSV files. The following Python program converts our test CSV file to a CSV file that uses tabs as a value separator and that has all values quoted. The delimiter character and the quote character, as well as how/when to quote, are specifed when the writer is created. These same options are available when creating reader objects.

import csv

ifile  = open('test.csv', "rb")
reader = csv.reader(ifile)
ofile  = open('ttest.csv', "wb")
writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

for row in reader:
    writer.writerow(row)

ifile.close()
ofile.close()

Running it produces:

  $ python csv2.py
  $ cat ttest.csv
  "A"     "B"     "C D"
  "1"     "2"     "3 4"
  "5"     "6"     "7"

My first task when starting to use the csv module was to write a function to try to determine what format the CSV file was in before opening it so that I could deal with commas and tabs and different quoting conventions:

import os
import sys
import csv

def opencsv(filename):
    tfile = open(filename, "r")
    line  = tfile.readline()
    tfile.close()
    if   line[0] == '"':
        quote_char = '"'
        quote_opt  = csv.QUOTE_ALL
    elif line[0] == "'":
        quote_char = "'"
        quote_opt  = csv.QUOTE_ALL
    else:
        quote_char = '"'
        quote_opt  = csv.QUOTE_MINIMAL
    
    if   line.find('\t') != -1:
        delim_char = '\t'
    else:
        delim_char = ','
    
    tfile  = open(filename, "rb")
    reader = csv.reader(tfile, delimiter=delim_char, quotechar=quote_char, quoting=quote_opt)
    return (tfile, reader)

Being new to the csv module and making the common mistake of not reading the whole "man" page, I of course failed to notice that the csv module already contains something to do this called the Sniffer class. I'll leave using it as an exercise for the reader (and in this case the writer also).

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.


Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Sara's picture

more info

On June 4th, 2009 Sara (not verified) says:

Hi,

How about if i want to output it becoming...

A : B
A : C D
1 : 2
1 : 3 4
5 : 6
5 : 7

Thanks

Mitch Frazier's picture

Try this

On June 4th, 2009 Mitch Frazier says:

This should do it:

import csv

ifile  = open('test.csv', "rb")
reader = csv.reader(ifile)

rownum = 0
for row in reader:
    colnum = 0
    for col in row[1:]:
        print '%-8s: %s' % (row[0], col)
        colnum += 1

    rownum += 1

ifile.close()
__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Skylar Saveland's picture

Nice Post

On December 25th, 2008 Skylar Saveland (not verified) says:

Very nice. I use a lot of CSV files for data analysis using R. I hope to become more involved in building and standardizing options for doing data analysis in Python as Python opens up the world of 'real' programming languages: eventually using R's functionality with webapps/gtk/etc.

Amit's picture

CSV Files to HTML

On December 21st, 2008 Amit (not verified) says:

Hi !

I recently wrote a quick script to convert CSV files of the form: foo, bar, blah, blah into nice HTML displays. Here it is:

#!/usr/bin/python
# -*- coding: utf-8 -*-

# CSV to HTML Converter

# Amit K. Saha 

import csv
import sys

if len(sys.argv) < 2:
  print "Usage: ./csv-html.py  "
  print
  print
  exit(0)

# Open the CSV file for reading
reader = csv.reader(open(sys.argv[1]))

# Create the HTML file
f_html = open(sys.argv[2],"w");
f_html.write('')

for row in reader: # Read a single row from the CSV file
  f_html.write('');# Create a new row in the table
  for column in row: # For each column..
    f_html.write('' + column + '');
  f_html.write('')


f_html.write('')

Hope it is useful!

Gumnos's picture

Making the first example more Pythonic

On December 16th, 2008 Gumnos (not verified) says:

That first code example reads a bit more cleanly if you use some standard Pythonisms (zip() and using the reader as an iterator, calling next() to get the header)


import csv
ifile = open("test.csv", "rb")
reader = csv.reader(ifile)
headers = reader.next()
for row in reader:
  for header, col in zip(headers,row):
    print "%-8s: %s" % (header, col)
ifile.close()

-gumnos

Mitch Frazier's picture

Thanks

On December 16th, 2008 Mitch Frazier says:

Actually, I'm fairly new to Python programming so I can use all the tips you got. For others not familiar with the Python builtin function zip(), this is from the docs:

zip([iterable, ...])

This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple arguments which are all of the same length, zip() is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1-tuples. With no arguments, it returns an empty list.

In other words, for the example here, it pairs each column with the corresponding header. Using the first data row from the test file, zip(headers,row) would return:

  [('A', '1'), ('B', '2'), ('C D', '3 4')]
__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Newsletter

Each week Linux Journal editors will tell you what's hot in the world of Linux. You will receive late breaking news, technical tips and tricks, and links to in-depth stories featured on www.linuxjournal.com.
Sign up for our Email Newsletter

Tech Tip Videos

From the Magazine

December 2009, #188

If last month's Infrastrucuture issue was too "big" for you then try on this month's Embedded issue. Find out how to use Player for programming mobile robots, build a humidity controller for your root cellar, find out how to reduce the boot time of your embedded system, and if you're new to embedded systems find out the basics that go into one. You can also read about the Beagle Board, the Mesh Potato and a spate of other interestingly named items. And along with our regular columns don't miss our new monthly column: Economy Size Geek.


Read this issue