Event-Driven Programming with Twisted and Python

Before you turn your server app into a thundering herd of processes or a hairball of threads, consider this clean, logical event-driven way to do it. Download the 600-line proxy server example and follow along.
Handling Errors

In the event of an error, self.outgoingProxyError is called with a Failure object, which brings us to error handling. Python's traditional error handling is done through exceptions, a concept familiar to other high-level languages, such as Java (Listing 3).

Although Python's model of exception handling works exceptionally well (pun intended) for synchronous designs, it does not take into account asynchronous design. For example, when we initiate an outbound HTTP connection, Twisted continues processing other events while the connection is made. But, we want to specify behavior to address any problems that may occur at the time we request the connection. Fortunately, the good people making Twisted took this into account. Just as code is scheduled to run when a blocking operation completes successfully, it also can be scheduled to run in case of an error.

Twisted also handles all exceptions raised within the event loop, with hooks for developers to manage and log exceptions. This has an added benefit too: although an exception might abort a specific event from completing, it does not bring down the server, even if you haven't put any exception-handling code in place.

Twisted Classes and Event Handling

When using some of the Twisted classes, such as the LineReceiver class we're using, you can handle many events simply by adding methods with the correct names to your classes. Each time the protocol receives a line, the lineReceived method is invoked with the text of the line as an argument. Our SimpleHTTP class, which is intended to do minimal processing of an HTTP session, has methods such as these:

  • startNewRequest: invoked at the beginning of each request.

  • lineReceived: designed to facilitate chat-oriented protocols. Each time a line of text comes over the socket, this method automatically is called.

  • rawDataReceived: when sending a binary file or raw streams of data, it isn't reasonable to process information separated by newline characters. To account for this, LineReceiver lets us switch to raw mode transfer, in which case rawDataReceived is called instead of lineReceived.

  • handleFirstLine: HTTP works by starting each request with a single line. Generally, the client is sending a GET or POST request with a URI, and the server responds with a status code. handleFirstLine is used to handle either of these cases.

  • handleHeadersFinished: invoked when HTTP headers are sent fully.

  • handleRequestFinished: invoked when the HTTP request itself has completed.

Writing separate methods for states or actions that occur in the processing of a protocol is how Twisted programmers queue up events. At the beginning of a request, we can specify events to occur at each stage of handling a request. In our earlier example, we decided to call self.outgoingConnectionMade once a connection has been made. Let's take a look at that method, as shown in Listing 4.

Notice that outgoing_proxy represents the connection we are making to a remote server, on behalf of the Web browser we are serving. We're sending the HTTP request by calling outgoing_proxy.write. We're also scheduling the self.outgoingFirstlineReceived method to be called when a response is received from the remote server. The self.outgoingHeadersReceived method is called when the remote server has sent back all of its HTTP headers. Finally, self.handleOutgoingRequestFinished is called when the remote server has finished entirely responding to our outgoing HTTP request.

Although the outgoingConnectionMade method returns before any of this happens, we're lining up events to happen in the future. It well may be that while waiting for a response on one connection, ten other requests are opened and closed—all in the same thread. All information relevant to a connection is stored as instance data on protocol classes. Factories spawn protocol instances, protocol instances keep session states and deferred objects bind future data to event handlers. Completing the puzzle, the reactor manages the dirty work of polling sockets. This is the combination of tools upon which Twisted is built.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Great article... problems with proxy script

drgalaxy's picture

First off, great article exposing the features of Twisted and your neat proxy program. I have been interested in a particular aspect of proxies that has not been focused on in FOSS or commercial proxies, and this article/code is as an excellent educational base for this endeavor.

I installed python2.3-twisted and python2.3-twisted conch on my debian woody box and launched the program, accessing the Internet from a Win32 machine running Firefox 1.0. I noticed right off that pages like slashdot and even interactive.linuxjournal.com were stripped of ads, and that it didn't seem to be any slower on loading (yay!).

Then I decided to hit some really popular sites like msn (more people than you think's default webpage), yahoo, cnn, etc. I found that almost all links off of yahoo's front page left my browser sitting idly as though the server is timing out. The problems seem to be even worse on msn.com. The common trait between these sites that don't work is that their urls (at least at first) are all generated with some kind of hash so the user can be identified when they hit the link. ex: http://www.yahoo.com/_ylh=X3oDMTEwdnZjMjFhBF9TAzI3MTYxNDkEdGVzdAMwBHRtcG...
as opposed to "http://sports.yahoo.com/gamepreview"

In addition, I have been getting error msgs from python during program operation:
(preceded by traceback through various parts of twisted)
File "./SimpleDujunkingProxy.py", line 593, in clientConnectionFailed
self.defer.errback(Failure(reason, ProxyLostConnectionError))
exceptions.NameError: global name 'ProxyLostConnectionError' is not defined

These errors are printed to stdout but not necessarily at the same time as when the pages won't load correctly. Thanks again for the thought provoking article!

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState