Pythonic Parsing Programs
Creed of Python Developers
Pythonistas are eager to extol
the lovely virtues of our language. Most beginning Python
programmers are invited to run
this from the interpreter right after the canonical
world. One of the favorite quips from running that
There should be one-- and preferably only one --obvious way to do it.
But the path to Python enlightenment is often covered in rocky terrain, or thorns hidden under leaves.
On that note, I recently had to
use some code that parsed a file. A problem arose when the
API had been optimized around the assumption that what I wanted to
parse would be found in the filesystem of a POSIX compliant system.
The implementation was a
on a class that was called
Well in 2013, we tend to ignore files and shove those lightweight
chisels of the 70s behind in favor of a shiny new super-powered
jack-hammers called NoSQL.
It so happened that I found myself with a string (pulled out of a NoSQL database) containing the contents of a file I wanted to parse. There was no file, no filename, only the data. But my API only supported access through the filename.
Perhaps the pragmatic solution would be to simply throw the contents into a temporary file and be done with it:
import tempfile data = get_string_data() # fancy call out to NoSQL with tempfile.NamedTemporaryFile() as fp: fp.write(data) fp.seek(0) obj = Foo.from_filepath(fp.name)
But I spent a bit of time thinking about the root of the problem and wanted to see how others solved it. Having a parsing interface that just supports parsing a string is probably a premature optimization on the other end of the spectrum.
A Little Light Reading
My first thought was to look to the source of all truth—The Python Standard Library. Surely it would enlighten me by illuminating all 19 tenets of “The Zen of Python”. I asked myself what modules I used to parse with the standard library and came up with the following list:
(Note: all of the above are
module names. The nested namespace of
violates Zen tenet #5 “Flat is better than nested”, and
violates PEP 8 naming conventions. This is not news to long time
Python programmers, but to newbies here it is a dose of reality. The
standard library is not perfect and has its quirks. Even in Python
3. And this is only the tip of the iceberg.)
A Clear Picture
I went through the documentation and source code for these modules to determine the single best, most Pythonic, beautiful, explicit, simple, readable, practical, non-ambiguous, and easy to explain solution to parsing. Specifically, should I parse a filename, a file-like object, or a string? Here is the resulting table I came up with:
is a 3rd party library, but there has been much hubbub going around
recently on the naming of
which is unsafe (but probably the method most will use unless
they really pour through the docs), and
which is safe (and hidden away in the docs).)
The trick to this table is to spin around three times, really squint your eyes, and pick something from the File column.
Matt Harrison (@__mharrison__) is a Python developer at Fusion-io where he helps build data analysis tools.
- Readers' Choice Awards 2014
- Handling the workloads of the Future
- diff -u: What's New in Kernel Development
- How Can We Get Business to Care about Freedom, Openness and Interoperability?
- Synchronize Your Life with ownCloud
- Days Between Dates?
- December 2014 Issue of Linux Journal: Readers' Choice
- Non-Linux FOSS: Don't Type All Those Words!
- Computing without a Computer
Editorial Advisory Panel
Thank you to our 2014 Editorial Advisors!
- Jeff Parent
- Brad Baillio
- Nick Baronian
- Steve Case
- Chadalavada Kalyana
- Caleb Cullen
- Keir Davis
- Michael Eager
- Nick Faltys
- Dennis Frey
- Philip Jacob
- Jay Kruizenga
- Steve Marquez
- Dave McAllister
- Craig Oda
- Mike Roberts
- Chris Stark
- Patrick Swartz
- David Lynch
- Alicia Gibb
- Thomas Quinlan
- Carson McDonald
- Kristen Shoemaker
- Charnell Luchich
- James Walker
- Victor Gregorio
- Hari Boukis
- Brian Conner
- David Lane