Event-Driven Programming with Twisted and Python

Before you turn your server app into a thundering herd of processes or a hairball of threads, consider this clean, logical event-driven way to do it. Download the 600-line proxy server example and follow along.

Addressing the complexity of handling multiple sessions with one thread is at the core of a framework such as Twisted. Network sessions are represented by subclasses of the twisted.internet.protocol.Protocol class, such that each Protocol instance represents a network session. These objects are spawned by Factory objects, which inherit from twisted.internet.protocol.Factory. A singleton, twisted.internet.reactor, handles the dirty work of polling sockets and invoking events. Calling reactor.run() in Twisted simply starts the event loop, and run() exits when the application finishes, the same as an event loop in GTK or Qt.

The Proxy Server Example

Our proxy server has two kinds of networked chat sessions: incoming HTTP requests and their respective outgoing proxies. Because HTTP is a chat-like protocol, we can inherit our protocol class from Twisted's LineReceiver, which subclasses Protocol while providing extra functionality useful for chat sessions, such as HTTP. Twisted actually includes classes specifically for making and handling HTTP requests. We are writing our own in part because Twisted's prefab classes don't facilitate proxy serving and also because it's a good programming exercise for this article.

Figure 1. Class diagram for a proxy server. The Protocol classes handle individual connections while the Factory classes create them.

Refer to Figure 1 for the class structure we are going to use. Instances of the Factory classes are used by Twisted to spawn off Protocol instances for each connection made. We create one SimpleHTTP class and inherit from it classes for managing incoming and outgoing traffic. Because HTTP is mostly the same for client and server, we can manage most of the lexical processing in one superclass and let subclasses do the rest, which is exactly how Twisted's own HTTP classes work.

Handling Callbacks

Operations you'd otherwise do with one or two methods tend to require several callback methods in event-driven programming. The rule of thumb is, any time there's a blocking operation you need to wait on, it happens outside your code and, therefore, between two of your methods. In the case of our proxy server, we can break down into separate chunks each part of handling a request. Most of what a proxy server does amounts to reading in data from a browser, making a few changes to that data and sending the modified data to the remote Web server. As of HTTP/1.1, multiple Web hits can be handled over one network connection. In Figure 2, you can see what happens to each request, keeping in mind that multiple requests can be made per HTTP connection. Arrows connecting boxes show which events are spawned and in what order.

Figure 2. Overall Steps in Processing Proxy Hits

In a blocking program, one might expect to handle opening a remote connection and sending it a line of text like this:

connection = socket.open(remote_server, remote_port)
connection.write(get_string)
response = connection.readline()

We've all seen this kind of blocking code before, so what is different about the Twisted way? Because we don't want to wait around for the connection to be made in an event-driven program, we simply schedule some code to run when the remote server gets back to us. In Twisted, this kind of deferment is handled by using an instance of the twisted.internet.defer.Deferred class as a placeholder for the result you would expect from a blocking operation. For example, in our proxy server, we accept a Deferred object when we initiate a remote connection (Listing 2).

The self.outgoing_proxy_cache.getOutgoing method initiates an outbound proxy connection. It doesn't wait, however, for the connection to be made to return to the caller; it returns immediately. The behavior of all methods to return as soon as possible is what makes a single-threaded server possible. Any and all CPU time taken by a method is spent processing, not waiting for external things to happen.

Notice how as a replacement for the connection object itself, a Deferred object is returned. By calling addCallback and addErrback on the Deferred object, we are scheduling future events to be fired, such that when an outbound connection is ready, the self.outgoingConnectionMade method is called. By passing uri as a second argument to addCallback, we are telling Twisted that self.outgoingConnectionMade also should be called, with uri as an additional argument.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Great article... problems with proxy script

drgalaxy's picture

First off, great article exposing the features of Twisted and your neat proxy program. I have been interested in a particular aspect of proxies that has not been focused on in FOSS or commercial proxies, and this article/code is as an excellent educational base for this endeavor.

I installed python2.3-twisted and python2.3-twisted conch on my debian woody box and launched the program, accessing the Internet from a Win32 machine running Firefox 1.0. I noticed right off that pages like slashdot and even interactive.linuxjournal.com were stripped of ads, and that it didn't seem to be any slower on loading (yay!).

Then I decided to hit some really popular sites like msn (more people than you think's default webpage), yahoo, cnn, etc. I found that almost all links off of yahoo's front page left my browser sitting idly as though the server is timing out. The problems seem to be even worse on msn.com. The common trait between these sites that don't work is that their urls (at least at first) are all generated with some kind of hash so the user can be identified when they hit the link. ex: http://www.yahoo.com/_ylh=X3oDMTEwdnZjMjFhBF9TAzI3MTYxNDkEdGVzdAMwBHRtcG...
as opposed to "http://sports.yahoo.com/gamepreview"

In addition, I have been getting error msgs from python during program operation:
(preceded by traceback through various parts of twisted)
File "./SimpleDujunkingProxy.py", line 593, in clientConnectionFailed
self.defer.errback(Failure(reason, ProxyLostConnectionError))
exceptions.NameError: global name 'ProxyLostConnectionError' is not defined

These errors are printed to stdout but not necessarily at the same time as when the pages won't load correctly. Thanks again for the thought provoking article!

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix