PyCon DC 2004
The Twisted talk essentially was its own mini-conference. The Twisted developers turned out in force, conducting several sprints on different Twisted topics, presenting a whole track's worth of session talks (including software that can be used both with Twisted and standalone) and drawing over thirty people to their BOF.
Divmod presented its Twisted-powered commercial Web mail site, whose source code is available at divmod.org if you'd like to compete with them. divmod.org distributes Atop (discussed above), Lupy (a full-text indexer), Nevow (discussed above), Pyindex (an indexer using Metakit), Quotient (the webmail server), Reverend (a Bayesian filter) and Stoom (Voice over IP). In addition to Web mail, divmod.com provides POP/IMAP access, integrated spamfiltering (you can override the Bayesian score), flexible searching (by phone number, URL, image, or "question" in the message), blogging, an image manager, contact list, calendar, to-do list and Internet phone (VoIP). They're working on blog-via-email and also multiplayer games. The cost will be somewhere under $10/month for Web mail and basic services, with additional fees for VoIP-to-landline, advanced games and the rest. As an open-source project, divmod.com also will be soliciting feature contributions from its users. A few attendees wondered what Chandler is trying to do that divmod.com isn't already doing. However, I couldn't find a link on divmod.com to sign up. Maybe signup-via-web is not quite ready yet, or maybe it needs a big "subscribe!" button.
Twisted is amazingly flexible and Pythonic, but that darn asynchronous event loop makes thread-happy people run in terror. You can't simply call functions that block, because blocking halts the entire server and makes everybody else's request wait. So you have to turn your program inside out to avoid blocking. Last conference I got excited about Twisted, and every few months thereafter I tried to understand this inside-out method of programming, but every time I hit a brick wall. Now, after talking with the developers at this conference (Donovan Preston, Nevow's inventor, was especially helpful) and reading the Deferred HOWTO three friggin' times, I think I'm finally getting somewhere.
First, there's probably a Twisted library for everything you need to do, so you can avoid the issue for a long time. Second, you mentally have to divide your code into chunks at the natural blocking points. Each chunk becomes a separate function, and you use callLater or addCallback to switch to the next function. This implies that for each blocking point you have to wrap two sections of code into functions. The first section is the slow function that runs after the pause; let's call the new function the "deferred job". The second portion is in the calling routine that depends on the slow function's return value; this is the "callback".
Let's say the slow function does a database lookup and the deferred job is the portion that comes after the I/O. The slow function uses callLater to schedule the deferred job and returns a Deferred instance (d) to the calling routine:
from twisted.internet import deferred, reactor # 'reactor' is the event loop instance that's running. d = defer.Deferred() # reactor.callLater(SECONDS_AS_FLOAT, FUNC, *args, **kw) reactor.callLater(2.5, deferredAction, d, arg2, arg3, kwarg='value') return d
callLater is like UNIX's at command: it schedules a background job to run at a future time. The job has no knowledge of its context, but because you can pass as many arguments as you want, it has all the info it needs. Of course, the job can be a bound method if you want to sneak in other information too. The job needs d, so pass it as an argument or sneak it in as an instance attribute.
The calling routine gets d instead of the result. How lovely, just what we didn't want. The calling routine has to register callback function(s) that will operate on the result.
d.addCallback(myCallback) return # Nothing more to do since I'll never get the result myself.
So d is the link between all the functions, which don't otherwise know about one another. Time passes and the deferred job is called. It produces a result and calls:
d.callback(result) # Oh, so *that's* why I needed 'd'.
Note: the deferred job does not return the result. d then calls all the callbacks that have been registered. The first one is called with one argument: the result. Any subsequent callbacks are called with the return value of the previous callback; this allows generic filter functions to serve as callbacks. The callbacks must completely dispose of the result by printing it or saving it somewhere, because it will be thrown away after the last callback finishes.
My two biggest hurdles with all this were, "What if the callback needs more info from the caller?" and "How does the slow function know when the result will be ready?" Let's look at these separately.
"What if the callback needs more info from the caller? I called the slow function because I wanted the result. I don't mind waiting, but why should I wait for somebody else to get the result? That's the biggest load of crap I've ever heard. I've got my local variables all nicely set, and the callback will need them." Because you can't pass arguments to the callback, you have to sneak the external data in as default values, nested-scope variables or instance attributes. If there's a lot of external data to sneak, using a bound method for the callback looks increasingly attractive.
"How does the slow function know when the result will be ready? It has to choose a certain number of seconds, but how does it know when a database query or socket read will be ready? Won't the deferred job just have to block anyway waiting for it?" The answer is so obvious I felt like a fool when I realized it. The deferred job does a non-blocking poll to see if the result is ready. If it's not, the job schedules itself to run again with all its same arguments after another interval. It avoids calling d.callback until the result is known. If the result is never known, the callbacks never are called, but that's what we want. There are optional errbacks and timeouts (which we won't discuss here) to handle exceptional situations.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- RSS Feeds
- Introduction to MapReduce with Hadoop on Linux
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?