Handling CSV Files in Python
As a buddy of mine always says "the nice thing about standards is that there's so many to choose from". Take CSV files for example. CSV, of course, stands for "Comma Separated Values", more often than not though, it seems that CSV files use tabs to separate values rather than commas. And let's not even mention field quoting. If you deal with CSV files and you use Python the csv module can make your life a bit easier.
Dealing with CSV files in Python probably couldn't be much easier. For example purposes, let's use the following CSV file that contains 3 columns "A", "B", and "C D":
$ cat test.csv A,B,"C D" 1,2,"3 4" 5,6,7
The following python program reads it and displays its contents:
import csv
ifile = open('test.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()
When run it produces:
$ python csv1.py A : 1 B : 2 C D : 3 4 A : 5 B : 6 C D : 7
In addition, the csv module provides writer objects for writing CSV files. The following Python program converts our test CSV file to a CSV file that uses tabs as a value separator and that has all values quoted. The delimiter character and the quote character, as well as how/when to quote, are specifed when the writer is created. These same options are available when creating reader objects.
import csv
ifile = open('test.csv', "rb")
reader = csv.reader(ifile)
ofile = open('ttest.csv', "wb")
writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
for row in reader:
writer.writerow(row)
ifile.close()
ofile.close()
Running it produces:
$ python csv2.py $ cat ttest.csv "A" "B" "C D" "1" "2" "3 4" "5" "6" "7"
My first task when starting to use the csv module was to write a function to try to determine what format the CSV file was in before opening it so that I could deal with commas and tabs and different quoting conventions:
import os
import sys
import csv
def opencsv(filename):
tfile = open(filename, "r")
line = tfile.readline()
tfile.close()
if line[0] == '"':
quote_char = '"'
quote_opt = csv.QUOTE_ALL
elif line[0] == "'":
quote_char = "'"
quote_opt = csv.QUOTE_ALL
else:
quote_char = '"'
quote_opt = csv.QUOTE_MINIMAL
if line.find('\t') != -1:
delim_char = '\t'
else:
delim_char = ','
tfile = open(filename, "rb")
reader = csv.reader(tfile, delimiter=delim_char, quotechar=quote_char, quoting=quote_opt)
return (tfile, reader)
Being new to the csv module and making the common mistake of not reading the whole "man" page, I of course failed to notice that the csv module already contains something to do this called the Sniffer class. I'll leave using it as an exercise for the reader (and in this case the writer also).
Mitch Frazier is an Associate Editor for Linux Journal.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Designing Electronics with Linux
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
- Kernel Problem
4 hours 7 min ago - BASH script to log IPs on public web server
8 hours 34 min ago - DynDNS
12 hours 10 min ago - Reply to comment | Linux Journal
12 hours 43 min ago - All the articles you talked
15 hours 6 min ago - All the articles you talked
15 hours 9 min ago - All the articles you talked
15 hours 11 min ago - myip
19 hours 35 min ago - Keeping track of IP address
21 hours 26 min ago - Roll your own dynamic dns
1 day 2 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
csv content formatting at csv file level
I am writing out to csv files a lot using csv module: The content if fairly 3 to 6 lines for each cell. But when I launch the created csv file which is different every time I run it due to timestamp embedded in the file name, the cell only shows few of the text.
Is there any function or fomatting along with the csv module which would enable the csv files to show all text for a cell for every cell instead of dragging the columns wider or going to format and autoformat each time.
eg. POWERUPREASON : 0x00000080
BL_VERSION : 0x00000005
UBOOT_VERSION : 0x00000000
Above is content for one cell, but the generated cell in file shows only this much text "POWERUP" when I drag the cell it shows all text but not initially. Please help
how to control precision with csv write
Hi,
I am writing & reading to both csv files (using the csv module) and to MS SQL databases (using pymssql). By default, they use different numeric precision levels. Specifically, the csv module uses only 12 digits while the DB module uses 17 digits. I may at times need precision greater than 12 digits, but I won't need beyond 17 digits. So I can use the pymssql default. But mostly, I'd like to know how to adjust these precision levels so that I have some control over it.
Here is an example small code, but it's trivial to test out in whichever way you want:
==================
import csv
myWriter = csv.writer(sys.stdout)
myWriter.writerow ( ('whatever', 12.345678901234567890) )
==================
will yield:
testing, 12.3456789012
(12 digits, total)
I know that I could do some sort of manipulation line by line, however the data that I am going to save is quite large, both in having many columns (of varying types) and in having many rows.
I have read that python, by default, prints only to 12 decimal points precision. I am therefore hoping that is some sort of default that I can adjust, maybe with a command looking something like:
float.precision(17)
or something similar to change all print defaults to 17. Ideally, I'd like to change something global like that so that I can take a very large list of tuples and just do the following:
myWriter.writerows(myData)
and that's it.
I am not sure how this thread works, but if I am not automatically notified via the e-mail address I provided, could you please e-mail any response to this.is.mvw@gmail? I don't check python or linux web pages frequently, as I am normally not working with either at any developer level.
Thanks!
Mike
Convert to string
If there is a way to change the default formatting precision for numbers I'm not aware of it.
A fairly simple workaround would be to convert your numbers to strings before writing them. For example:
myWriter = csv.writer(sys.stdout) nstr = '%0.19f' % 12.345678901234567890 myWriter.writerow ( ('whatever', nstr) )Mitch Frazier is an Associate Editor for Linux Journal.
thanks!
Hi Mitch,
Thanks so much for the suggestion! Somehow I hadn't though of that. :( I normally like to keep my variables in their "native form", but it doesn't matter if I'm sending it to a file anyway.
Thanks!
Mike
thanks!
this tutorial saved the day! thanks a million...
CSV help
How i can write this using CSV as input will be in CSV file.
Input:
a = 1
b = 2
c = 3
Output
If i print a then it should give 1
If i print b then it should give 2
and so on..........
thanks
Not clear
It's not at all clear what you're asking.
Mitch Frazier is an Associate Editor for Linux Journal.
How would you select clumns to put in new file
I came accross this code for perl:
http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/...
perl -e " @cols=(1, -1, 2); while(<>) { s/\r?\n//; @F=split /\t/, $_; print join( qq~\t~, @F[@cols]), qq~\n~ } warn qq~\nChose columns ~, join( qq~, ~, @cols), qq~ for $. lines\n\n~ " all_cols > some_cols_chosen
and I am trying to get a python code that does the same and will allow me to write say only the first 13 columns from a csv file to a new file.
Thanks
Shawn
more info
Hi,
How about if i want to output it becoming...
A : B
A : C D
1 : 2
1 : 3 4
5 : 6
5 : 7
Thanks
Try this
This should do it:
Mitch Frazier is an Associate Editor for Linux Journal.
Nice Post
Very nice. I use a lot of CSV files for data analysis using R. I hope to become more involved in building and standardizing options for doing data analysis in Python as Python opens up the world of 'real' programming languages: eventually using R's functionality with webapps/gtk/etc.
CSV Files to HTML
Hi !
I recently wrote a quick script to convert CSV files of the form: foo, bar, blah, blah into nice HTML displays. Here it is:
#!/usr/bin/python # -*- coding: utf-8 -*- # CSV to HTML Converter # Amit K. Saha import csv import sys if len(sys.argv) < 2: print "Usage: ./csv-html.py " print print exit(0) # Open the CSV file for reading reader = csv.reader(open(sys.argv[1])) # Create the HTML file f_html = open(sys.argv[2],"w"); f_html.write('') for row in reader: # Read a single row from the CSV file f_html.write('');# Create a new row in the table for column in row: # For each column.. f_html.write('' + column + ''); f_html.write('') f_html.write('')Hope it is useful!
Making the first example more Pythonic
That first code example reads a bit more cleanly if you use some standard Pythonisms (
zip()and using the reader as an iterator, callingnext()to get the header)import csv
ifile = open("test.csv", "rb")
reader = csv.reader(ifile)
headers = reader.next()
for row in reader:
for header, col in zip(headers,row):
print "%-8s: %s" % (header, col)
ifile.close()
-gumnos
Thanks
Actually, I'm fairly new to Python programming so I can use all the tips you got. For others not familiar with the Python builtin function zip(), this is from the docs:
In other words, for the example here, it pairs each column with the corresponding header. Using the first data row from the test file, zip(headers,row) would return:
[('A', '1'), ('B', '2'), ('C D', '3 4')]Mitch Frazier is an Associate Editor for Linux Journal.