Event-Driven Programming with Twisted and Python
In the beginning, there were forking servers and then came threaded servers. Although they manage a few concurrent connections well, when network sessions reach into the hundreds or even thousands, forking and threading servers spawn too many separate, resource-consuming processes to be efficient. Today, there is a better way, asynchronous servers. A new breed of frameworks for third-generation languages is taming the once complex world of event-driven programming.
A rising star in the Python community has been Twisted, which makes asynchronous programming simple and elegant while providing a massive library of event-driven utility classes. In this article, I discuss asynchronous event-driven programming and how it's done in Twisted. Because reading about code only gets you so far, I cite examples from a real Twisted application developed for this article: a simple proxy server that blocks unwanted cookies, images and connections. Instructions on how to get the complete source code are in the on-line Resources.
The Twisted Project has been gaining popularity as a powerful and increasingly stable way of implementing networked applications. At its core, Twisted is an asynchronous networking framework. But unlike other such frameworks, Twisted boasts a rich set of integrated libraries for handling common protocols and programming tasks, such as user authentication and even remote object brokering. One of the philosophies behind Twisted is breaking down traditional separations among toolkits, as the same server that serves Web content could resolve DNS lookups. Although the package itself is quite large, applications need not import all the components of Twisted, so run-time overhead is kept to a minimum.
As with Python, Twisted's user base has been expanding from its academic roots to the commercial and government sectors. At Zoto, we're using Twisted in a distributed photo storage and management application, because it enables us to develop scalable network software quickly in a famously productive language, Python. Programming day to day, I appreciate Twisted for its impressive toolkit and supportive community. And as with all community-oriented open-source projects, Twisted is a safe business bet, because its existence doesn't hinge on the continued support of any single company or institution.
Have you ever been standing in the express lane of a grocery store, buying a single bottle of water, only to have the customer in front of you challenge the price of an item, causing you and everyone behind you to wait five minutes for the price to be verified? Plenty of explanations of asynchronous programming exist, but I think the best way to understand its benefits is to wait in line with an idle cashier. If the cashier were asynchronous, he or she would put the person in front of you on hold and conduct your transaction while waiting for the price check. Unfortunately, cashiers are seldom asynchronous. In the world of software, however, event-driven servers make the best use of available resources, because there are no threads holding up valuable memory waiting for traffic on a socket. Following the grocery store metaphor, a threaded server solves the problem of long lines by adding more cashiers, while an asynchronous model lets each cashier help more than one customer at a time.
This isn't to say there aren't benefits to a threaded model. For instance, with microthreads, the amount of resources used by any particular thread is reduced substantially. There's an inherent complexity in asynchronous programming, especially when you need to do many blocking operations in succession. In Python, however, the benefits of threading are diminished by Python's Global Interpreter Lock (GIL). Threaded programming in Python is refreshingly simple, because all internal Python operations are thread-safe. To add an item to a list or set a dictionary key, no locks are required, so as to avoid race conditions among threads. Unfortunately, this is implemented through an interpreter-wide lock that Python's interpreter uses liberally. So, although two threads safely can append to the same list at the same time, if they're appending to two different lists, the same lock is used. Because threaded Python applications suffer a resulting performance hit, asynchronous single-thread programming is all the more desirable for a language such as Python.











This week 5 lucky Members will receive a copy of The Official Ubuntu Server Book by Benjamin Mako Hill and Linux Journal's very own Kyle Rankin. No entry necessary. Check back here early next week to find out who the lucky Online Members are.




Comments
Great article... problems with proxy script
First off, great article exposing the features of Twisted and your neat proxy program. I have been interested in a particular aspect of proxies that has not been focused on in FOSS or commercial proxies, and this article/code is as an excellent educational base for this endeavor.
I installed python2.3-twisted and python2.3-twisted conch on my debian woody box and launched the program, accessing the Internet from a Win32 machine running Firefox 1.0. I noticed right off that pages like slashdot and even interactive.linuxjournal.com were stripped of ads, and that it didn't seem to be any slower on loading (yay!).
Then I decided to hit some really popular sites like msn (more people than you think's default webpage), yahoo, cnn, etc. I found that almost all links off of yahoo's front page left my browser sitting idly as though the server is timing out. The problems seem to be even worse on msn.com. The common trait between these sites that don't work is that their urls (at least at first) are all generated with some kind of hash so the user can be identified when they hit the link. ex: http://www.yahoo.com/_ylh=X3oDMTEwdnZjMjFhBF9TAzI3MTYxNDkEdGVzdAMwBHRtcGwDaW5kZXgtY3Nz/s/228814
as opposed to "http://sports.yahoo.com/gamepreview"
In addition, I have been getting error msgs from python during program operation:
(preceded by traceback through various parts of twisted)
File "./SimpleDujunkingProxy.py", line 593, in clientConnectionFailed
self.defer.errback(Failure(reason, ProxyLostConnectionError))
exceptions.NameError: global name 'ProxyLostConnectionError' is not defined
These errors are printed to stdout but not necessarily at the same time as when the pages won't load correctly. Thanks again for the thought provoking article!
Post new comment