Event-Driven Programming with Twisted and Python

Before you turn your server app into a thundering herd of processes or a hairball of threads, consider this clean, logical event-driven way to do it. Download the 600-line proxy server example and follow along.

In the beginning, there were forking servers and then came threaded servers. Although they manage a few concurrent connections well, when network sessions reach into the hundreds or even thousands, forking and threading servers spawn too many separate, resource-consuming processes to be efficient. Today, there is a better way, asynchronous servers. A new breed of frameworks for third-generation languages is taming the once complex world of event-driven programming.

A rising star in the Python community has been Twisted, which makes asynchronous programming simple and elegant while providing a massive library of event-driven utility classes. In this article, I discuss asynchronous event-driven programming and how it's done in Twisted. Because reading about code only gets you so far, I cite examples from a real Twisted application developed for this article: a simple proxy server that blocks unwanted cookies, images and connections. Instructions on how to get the complete source code are in the on-line Resources.

What Is Twisted?

The Twisted Project has been gaining popularity as a powerful and increasingly stable way of implementing networked applications. At its core, Twisted is an asynchronous networking framework. But unlike other such frameworks, Twisted boasts a rich set of integrated libraries for handling common protocols and programming tasks, such as user authentication and even remote object brokering. One of the philosophies behind Twisted is breaking down traditional separations among toolkits, as the same server that serves Web content could resolve DNS lookups. Although the package itself is quite large, applications need not import all the components of Twisted, so run-time overhead is kept to a minimum.

As with Python, Twisted's user base has been expanding from its academic roots to the commercial and government sectors. At Zoto, we're using Twisted in a distributed photo storage and management application, because it enables us to develop scalable network software quickly in a famously productive language, Python. Programming day to day, I appreciate Twisted for its impressive toolkit and supportive community. And as with all community-oriented open-source projects, Twisted is a safe business bet, because its existence doesn't hinge on the continued support of any single company or institution.

What Is Asynchronous Programming?

Have you ever been standing in the express lane of a grocery store, buying a single bottle of water, only to have the customer in front of you challenge the price of an item, causing you and everyone behind you to wait five minutes for the price to be verified? Plenty of explanations of asynchronous programming exist, but I think the best way to understand its benefits is to wait in line with an idle cashier. If the cashier were asynchronous, he or she would put the person in front of you on hold and conduct your transaction while waiting for the price check. Unfortunately, cashiers are seldom asynchronous. In the world of software, however, event-driven servers make the best use of available resources, because there are no threads holding up valuable memory waiting for traffic on a socket. Following the grocery store metaphor, a threaded server solves the problem of long lines by adding more cashiers, while an asynchronous model lets each cashier help more than one customer at a time.

This isn't to say there aren't benefits to a threaded model. For instance, with microthreads, the amount of resources used by any particular thread is reduced substantially. There's an inherent complexity in asynchronous programming, especially when you need to do many blocking operations in succession. In Python, however, the benefits of threading are diminished by Python's Global Interpreter Lock (GIL). Threaded programming in Python is refreshingly simple, because all internal Python operations are thread-safe. To add an item to a list or set a dictionary key, no locks are required, so as to avoid race conditions among threads. Unfortunately, this is implemented through an interpreter-wide lock that Python's interpreter uses liberally. So, although two threads safely can append to the same list at the same time, if they're appending to two different lists, the same lock is used. Because threaded Python applications suffer a resulting performance hit, asynchronous single-thread programming is all the more desirable for a language such as Python.

Accepting Connections and Sending Responses

Let's start with a simple example of a server that accepts connections on port 1100. For each connection, it sends the UNIX time and closes the socket.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Great article... problems with proxy script

drgalaxy's picture

First off, great article exposing the features of Twisted and your neat proxy program. I have been interested in a particular aspect of proxies that has not been focused on in FOSS or commercial proxies, and this article/code is as an excellent educational base for this endeavor.

I installed python2.3-twisted and python2.3-twisted conch on my debian woody box and launched the program, accessing the Internet from a Win32 machine running Firefox 1.0. I noticed right off that pages like slashdot and even interactive.linuxjournal.com were stripped of ads, and that it didn't seem to be any slower on loading (yay!).

Then I decided to hit some really popular sites like msn (more people than you think's default webpage), yahoo, cnn, etc. I found that almost all links off of yahoo's front page left my browser sitting idly as though the server is timing out. The problems seem to be even worse on msn.com. The common trait between these sites that don't work is that their urls (at least at first) are all generated with some kind of hash so the user can be identified when they hit the link. ex: http://www.yahoo.com/_ylh=X3oDMTEwdnZjMjFhBF9TAzI3MTYxNDkEdGVzdAMwBHRtcG...
as opposed to "http://sports.yahoo.com/gamepreview"

In addition, I have been getting error msgs from python during program operation:
(preceded by traceback through various parts of twisted)
File "./SimpleDujunkingProxy.py", line 593, in clientConnectionFailed
self.defer.errback(Failure(reason, ProxyLostConnectionError))
exceptions.NameError: global name 'ProxyLostConnectionError' is not defined

These errors are printed to stdout but not necessarily at the same time as when the pages won't load correctly. Thanks again for the thought provoking article!