Although it sometimes might seem as if relational databases have gone the way of the dinosaur, making way for non-relational (NoSQL) databases, such as MongoDB and Cassandra, a very large number of systems still depend on a relational database. And, although there is no requirement that a relational database use SQL as its query language, it's a rare database product that does not do so.
The good news is that SQL is relatively easy to work with, particularly when the queries are straightforward. It's fast and easy to create tables, insert data into them, update that data and write queries that retrieve some or all parts of the data. SQL also makes it fairly easy to combine ("join") information from multiple tables, letting you normalize the data, while keeping speed and flexibility at a maximum.
SQL might not be difficult to work with on its own, but you rarely work with it in a vacuum. Usually, your SQL statements reside within a program you have written. The SQL is kept as a text string within the application and is then sent, via a network socket, to the server.
There are several problems with this. First, it means you have to mix two different languages within the same program. Inside your Web application, which you've worked hard to write, and which you try to ensure is maintainable, you have code in a totally separate language, inside strings, which you cannot test or maintain directly.
Even if SQL and a typical server-side language were on equal footing in terms of the syntax of a Web application, there's a fundamental mismatch between the ways in which they handle data. SQL operates with rows and columns within tables; everything in a relational database has to fit into this table-centric view of the world. By contrast, modern programming languages have a rich variety of data structures and typically are object-oriented to some degree or another.
Libraries that bridge the gap between procedural code and SQL are known as object-relational mappers, or ORMs. ORMs typically represent database records as instances of a particular class. In order to represent 50 records, you would need 50 instances of a class, with the state of each instance reflecting the names, types and values of the columns in that record.
There are two basic paradigms for passing data between the object-oriented data structures and the database, both of which were described by Martin Fowler. In the first paradigm, known as Active Record, each instance is tied directly to a row in the database, and the class itself (as well as each object) is responsible for ensuring that the data is saved to the database. In other words, Active Record requires that you create a single class, and that it handles both sides of the object-relational divide. The Active Record class in Ruby on Rails is (not surprisingly) an implementation of this paradigm and provides a great deal of power and flexibility.
A second paradigm is known as Data Mapper, and it requires the use of three different object classes: a class that represents the data itself at the object level, a class that represents the database table and a "mapper" object that acts as a go-between, ensuring that the object and relational parts of the system are appropriately synchronized.
An excellent and popular example of the Data Mapper paradigm can be found in the SQLAlchemy project. SQLAlchemy has been around for a number of years already, and makes it possible to work with relational databases flexibly from within your Python program, without having to write any SQL.
In this article, I take a look at SQLAlchemy, exploring a number of its options and features, and considering how it can be used in Web and other applications.
Connecting to a Database
Installing SQLAlchemy should be straightforward to anyone who has installed Python packages before. You can get SQLAlchemy from PyPi, the Python Package Index, either by downloading it from http://pypi.python.org or by using the easy_install or pip programs to retrieve and install it. I was able to install it with:
pip install sqlalchemy
You might need to install SQLAlchemy as root. Or, you can install it into a virtual machine, using the popular virtualenv package for Python, which gives you nonroot control and permissions over a Python environment.
You also will need to install a driver for the database you intend to use. My favorite relational database is PostgreSQL, and I use the psycopg Python driver, also available on PyPi and (by extension) via pip.
I should note that although I know SQLAlchemy works with Python 3, much of the work I do nowadays is still in Python 2, mostly because that's what my clients are using. My examples, thus, also will be in Python 2, although I believe they will work in Python 3 with little or no change.
Let's assume you have a database table, People:
CREATE TABLE People (id SERIAL PRIMARY KEY, first_name TEXT, last_name TEXT, email TEXT, birthday DATE);
Let's also add some initial records:
INSERT INTO People (first_name, last_name, email, birthday) VALUES ('Reuven', 'Lerner', 'email@example.com', '1970-jul-14'), ('Foo', 'Bar', 'firstname.lastname@example.org', '1970-jan-1');
In order to access this table using SQLAlchemy's ORM, you first need to create a database session object, which itself must be created using an "engine". Each database driver has its own style of URL. In the case of PostgreSQL accessed via the psycopg2 driver, you would use something like this:
dburl = 'postgresql+psycopg2://reuven:reuven@localhost/atf'
This URL indicates not only the database and driver type, but also my user name and password ("reuven" in both cases), the hostname (localhost) and the name of the database I'll be accessing ("atf"). If the database is not available at the default PostgreSQL port of 5432, you can specify that as well in the URL.
You then tell Python to create a new engine based on this URL:
from sqlalchemy import create_engine engine = create_engine(dburl)
Now that you have the engine defined, you can create a session based on this engine. Doing so requires two steps: first you create a new, custom Session class for this engine, and then you create an instance of the Session class that you will use to access the database:
from sqlalchemy.orm import sessionmaker # import sessionmaker class Session = sessionmaker(bind=engine) # make custom session type session = Session() # make instance of session
You're now connected to the database! But, that's not quite enough. If you want to map your database table to one or more Python objects, you need to define a class. You do this by defining a normal Python class, with a few subtle changes:
The class must inherit from Base, a class returned from the declarative_base function provided by SQLAlchemy.
The database columns must be defined as class attributes, as instances of the SQLAlchemy-provided Column class.
You connect the class with your database table by defining the
For example, the following Python class provides a mapping to the People database table:
from sqlalchemy import Column, Integer, String, DateTime from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class Person(Base): __tablename__ = 'people' id = Column(Integer, primary_key=True) first_name = Column(String) last_name = Column(String) email = Column(String) birthday = Column(DateTime) def __init__(self, firstname, lastname, email, birthday): self.first_name = firstname self.last_name = lastname self.email = email
It might not be obvious at first glance, but this class implements the Data Mapper design pattern. The class attributes that you have defined describe the columns in the database table and can contain a great deal of detail, including indexes, uniqueness requirements and even custom integrity constraints, such as those provided by PostgreSQL. The class itself is a standard Python class.
But behind the scenes, because this class inherits from Base, you get a
number of other attributes, including
__mapper__, which indicates how
your Python class will be mapped to the database table. You can see
this in an interactive Python shell by asking to see the printed
Person.__mapper__ <Mapper at 0x10af5ef90; Person>
You now have a session that connects to the database and a table in the database that has been described in Python. You now can execute a query against your table:
for p in session.query(Person): print p.first_name
That gives the following:
In other words,
session.query is executing a query against the
database, without you having to specify the SQL. You also can restrict
the records you'll get, by chaining the
filter_by method to your
for p in session.query(Person).filter_by(id=1): print p.first_name
That gives the following:
Note that the
filter_by method is not acting on the results of
session.query. Rather, it is changing the SQL that
eventually is sent
to the database. You can see this by assigning printing to the query
object without executing it or putting it in an iteration context:
print session.query(Person).filter_by(id=1) SELECT people.id AS people_id, people.first_name AS people_first_name, people.last_name AS people_last_name, people.email AS people_email, people.birthday AS people_birthday FROM people WHERE people.id = :id_1
You also can see from this query that SQLAlchemy binds parameters to variables inside your query, rather than directly placing your values. Not only does this allow you to re-run queries later with different variable values, but it reduces the possibility that you will suffer from an SQL injection attack, which still is surprisingly common.
You also can order the results:
for person in session.query(Person).order_by('first_name'): print person.first_name Foo Reuven
And, you also can do all of the basic "CRUD" activities that you would expect. For example, you can create a new instance of your Person class and then save it to the database:
p = Person('newfirst', 'newlast', 'email@example.com', '1-jan-2012'); session.add(p) session.commit()
Notice how I can handle multiple inserts (or other actions) inside
a single transaction by only issuing
session.commit() after adding
several objects. Similarly, I can update the object and the
corresponding row in the database:
p.first_name = '!!!' session.add(p) session.commit()
I also can delete the object:
Reuven M. Lerner, Linux Journal Senior Columnist, a longtime Web developer, consultant and trainer, is completing his PhD in learning sciences at Northwestern University.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Happy GPL Birthday VLC!
- Unikernels, Docker, and Why You Should Care
- Handheld Emulation: Achievement Unlocked!
- Giving Silos Their Due
- Controversy at the Linux Foundation
- Don't Burn Your Android Yet
- Wine 1.8 Released
- Firefox OS