PyCon DC 2003

by Mike Orr

This article has been updated as of 12pm, April 18.

PyCon DC 2003 took took place March 26-28, 2003, at George Washington University in Washington, DC. It was Python's first do-it-yourself conference, organized by Pythoneers dissatisfied with the $500 entrance fees associated with professionally run conferences. This conference cost early-bird registrants a paltry $150, low enough to allow some to attend who otherwise could not have. The minimalist approach worked; rather than bankrupting the Python Software Foundation (PSF) as some feared, it appears to have generated a modest profit, around $200 (preliminary estimate). Some twenty people who registered but didn't attend helped the bottom line. We appreciate your money--and we ate your lunch too.

The conference attracted almost 250 registrants, the same amount as attended last year's Python10. That's pretty good considering the untried format, the three other Python conferences this year (OSCon, Python UK and EuroPython), the high unemployment rate, the current difficulty in obtaining US visas, fear of terror attacks and the fact that Gulf War II broke out less than a week before the conference. Some 20% of the attendees braved the obstacles and came from overseas anyway.

To keep costs low, the organizers chose a university setting rather than a hotel. The DC metro stop on the university campus helped make it easy for attendees to commute from cheap accommodations. The word "cheap" in the previous sentence is a joke; nothing is cheap in DC, especially not the food--$6 gets you a Jamba Juice smoothie and a cookie. I did score a $20 hostel in the Adams-Morgan neighborhood, however.

The theme of this year's conference was Popularizing Python. Steve Holden, the conference chairman, noted that attendees weren't only geeks but a good mix of scientists, educators, programmers, writers and entertainers, all of whom worked together and became colleagues.

PyCon Firsts

The do-it-yourself nature of the event was manifest in the schedule. In addition to the keynote speeches, refereed-paper presentations and lightning talks (15-minute unrefereed speeches), there was something called "open space". Three and a half hours were set aside over two days for informal roundtable talks and discussions, akin to birds of a feather (BoF) sessions. After the keynote, attendees had the option of writing topics on colored pieces of paper, sticking them on the schedule board, announcing the topics into the microphone and then waiting at the designated time to see if anybody showed up. Almost all of the schedule slots filled up, even with two or three open-space meetings happening simultaneously. Fortunately, the room was big enough that sessions could go beyond their 15-minute time slots if necessary. Most open-space meetings attracted around ten people.

Another first this year was the sprint, time set aside for hacking together. In this case, two days were set aside before the conference for four sprints: Zope, the Python core, Twisted and Webware. Each sprint had four to ten participants. Most sprints required a relatively high level of programming experience to participate, but the Webware sprint was open to future developers to give them a chance to get their feet wet.

Another first was wireless access points in every room and in the foyer. This allowed sprinters and attendees to simply open their laptops and have immediate internet access. A limited number of wired ports were available for those without wireless cards. At least a third of the laptops present were Macintosh PowerBooks.

My Sprint and Open Space

I was in the Webware sprint. (Webware is a web application server.) We identified five top to-do items: developer's documentation, a user manager with roles and permissions, spec-ing out a content management system and a component architecture for optional features (to avoid the overuse of inheritance). We chose to work on developers' documentation, as it was the key to getting the other things done. One guy added docstrings to the source, another wrote unit tests, another started diagramming the class structure in UML and I started writing an architectural overview from the perspective of a web request (transaction).

I also led an open-space discussion on PyYAML, the Python library for YAML (YAML Ain't Markup Language), which is a human-readable data serialization format almost, but not entirely, unlike XML. Being part of the team that's writing the next version of PyYAML, I needed to get input from users about what tasks they were using PyYAML for and what features they needed. We also introduced YAML to those who hadn't seen it.

Both these events brought the developers in contact with people we hadn't known who were interested in the project. In Webware's case, it was one guy in DC who had recently become a Webware enthusiast. In PyYAML's case, it was three Zope people who wanted to use YAML in Zope, each in a different way.

The Hundred-Year Language

Paul Graham gave the keynote address on the programming language he envisions we'll be using in a hundred years. He says speculating this is not far-fetched, because programming already has existed for almost fifty years and has changed remarkably little. In a hundred years we'll still be writing programs in text files of some sort, and there will still be something called Fortran.

Graham says the languages that will survive the evolutionary change are those with the fewest number of axioms (primitives), because they are the most flexible. As hardware becomes ever more powerful, "inefficient" but flexible programming practices become feasible, and the languages that do this are thriving because they make work easier for programmers. For instance, Python's flexible data types are more flexible than C's or Java's strong types. Graham calls his search for flexibility a search for creative ways to "waste" CPU cycles in the future.

Graham recommends an even higher-level abstraction for data types. For instance, there's no need for a separate string primitive because a string can be represented as a character array (list). So let strings be dealt with through library routines rather than being built into the language. Arrays aren't necessary because they are simply a special case of mappings (dictionaries). His most radical suggestion is the elimination of numeric types, because a number N can be represented by an array of length N. There even are proofs that mathematical operations, such as adding and multiplying, can be performed on arrays as easily as on special numeric types--even negative and fractional numbers, I'm told. Don't worry about this type surgery infecting Python: Guido wasn't buying it. And Graham says Python already is closer to his ideal than most languages are.

Graham concluded by saying it's great that programming languages are being written now by programmers rather than by professors, because programmers have practical needs that professors tend to ignore. Somebody also asked what the killer application will be in a hundred years. The answer: games with skins, of course.

Twisted

Every Python conference offers one talk that has the biggest effect on me. At Python9 it was the Webware talk; at Python10 it was Tim Berners-Lee's talk on web-izing Python and RDF. Here at PyCon it was the Twisted tutorial. Twisted's developers were out in force, both at the sprint and with an entire track of refereed papers. Previously I had seen the Twisted web site and thought, "confusing, too big a learning curve, nothing I need", but hearing it explained in person really helped. Twisted is a modular platform for building internet applications, essentially an OO level that integrates Python's networking libraries and combines them with common internet services. It uses a single-threaded, non-blocking form of I/O, which they claim is significantly faster than multithreaded or multiprocess paradigms. Twisted has a few basic classes:

Reactor

The network event loop

Transport

TCP, SSL, UNIX domain sockets, etc.

Protocol

An interchange format for data (HTTP, SSH, DNS, etc.)

Factory

Creates a protocol-transport instance for a new client connection

Producer

Generates data at the rate a Consumer needs it

Service

Your program-specific logic; aka "business logic"

Application

Encapsulates all of the above

twistd

The dæmon that runs your application

One of Twisted's main features is the separation of transport from protocol. That makes changing a program's transport a simple two-line process, including one line for the import. It's useful for debugging programs from an alternate I/O source or for adapting to unforeseen circumstances.

As for the kinds of applications Twisted supports, somebody once said, "All applications eventually evolve to the point that you can read e-mail from them." Twisted can do that, as well as provide an SSH interface, a DNS server, a MUD game server and whatever else your application might want. The libraries are only an import statement away.

Why does Twisted have an SSH server? Why do people climb mountains? Because they're there; because they can. And it turns out that Twisted's SSH is faster than OpenSSH, because the latter forks twice for each connection. Why use Twisted's DNS rather than BIND? It's much smaller and simpler, so it's less prone to security alerts. Why use Twisted Web rather than Apache? The configuration file is much easier, employing ordinary Python method calls rather than a special configuration syntax.

For GUI programming, alternate reactors play well with various GUI event loops (GTK, Qt and so on). There are also libraries for accessing DBI-compliant databases (MySQL, etc); they are necessary because Twisted is single-threaded. So, if you use the standard database libraries, they will do blocking I/O and freeze all the other connections. There is also a library for running your business logic in a separate thread, in case you have to do blocking I/O for any other reason.

Twisted Web has a simple transaction model. A URL such as /foo/bar is translated to:

site.getChild('foo', request).getChild('bar', request).render()

foo and bar are twisted.resource.Resource objects. You define .getChild() and .render().

Documentation Formats

Two talks on documentation formats were presented: reST and Twisted Lore. reST (reStructured Text) is a simple ASCII format for rich text that is on its way to becoming a Python standard. The input is similar to a Wiki format, and the output is (officially) HTML, (unofficially) LaTeX and (experimentally) DocBook.

Lore uses an XHTML variant for input and produces HTML and LaTeX output. XHTML was chosen over LaTeX and DocBook for input because it dramatically lowers the bar for participation: everybody writes documentation in HTML nowadays anyway and making it XML-conformant requires only a few little changes, like always closing your <P> tags and writing <BR> as <BR />. Special <DIV>, <SPAN> and <PRE> classes (pseudo styles) handle footnotes, syntax highlighting of Python source code and so on. Lore adds TOC links at the top for each of your <H1> headers, puts previous/next links on the pages and generates a crude index.html. The LaTeX output format allows conversion to Postscript and PDF. The lore command has a lint option to check the input format, and it even checks the syntax of your embedded Python listings. It also warns if the listings are wider than 80 characters.

Other Talks

The highest quality talk I attended was the threads tutorial. It was well organized and clear, making an often-confusing subject seem simple. Aahz explained the global interpreter lock (GIL) and the threading module and explained why queues are the greatest thing since sliced bread.

There were a couple talks on configuration-file parsers, including Zope's ZConfig, Leonard Richardson's Beyond the Config File framework and my PyYAML open-space session.

Ian Bicking, a Webware developer, led a "Web Framework Shootout". He wrote a paper describing and comparing several of the non-Zope frameworks. In the beginning, remember, there was Zope. But Zope had its discontents, who developed CherryPy, SkunkWeb, Quixote, Webware and others, which are as modular and Pythonic as Zope is monolithic and weird. (Twisted also is modular, but it's more than a mere web application server.) Ian argues that this multiplicity of servers may have solved one problem, but it created another: so many incompatible interfaces inhibit code sharing and content sharing between them. So he pleads, "Don't invent any more web application servers. Instead, improve the ones that exist, and help them consolidate into a few killer apps." Some attendees objected and said the ability to write your own application server easily is one of Python's strengths, and who cares if there are five or five hundred of them.

Another debate revolved around whether the singlethread (Twisted), multithread (Webware, Zope) or multiprocess (SkunkWeb, Apache) model is best. Somebody asked whether any of Python's web application servers have been deployed in situations with over 200 users. This brought a response by somebody who had worked on Yahoo Mail. Yahoo Mail was written in Python at a time when none of the other servers existed. It scaled to several million users with ease, although it was later replaced with C++.

The Zope team continues to be responsive to the requests and complaints of Zope application developers. In my Python9 article, I wrote about how the pleas of Zope programmers for support led the Zope team to beef up the documentation and to encourage the previously-discouraged Python methods (external methods) for complex logic. In Zope 3 (unreleased) they've eliminated implicit acquisition, due to complaints from developers that it's too error prone. Now, if you want to acquire an attribute/method from a runtime container (folder), you must ask for it explicitly. In other changes, Zope 3 uses an XML (.zcml) file for configuration. Some developers also are considering running Zope on Twisted.

Some of the talks I didn't attend covered topics such as unit testing, PyChecker (a lint for Python source files), Jython, managing large project releases, the Semantic Web (RDF), Numarray (an alternative to Numeric Python), a .Net wrapper (Korba) and others.

Guido Speaks

And now, what everybody has been waiting for--the Great Guido discussing Python's evolution. Downloads of Python at python.org continue to increase with every release since 2.0, although not dramatically. The PSF can accept donations now, so maybe it's time to spend a little money to bootstrap Python marketing.

Guido continued the tradition of summarizing the flame war du jour. He claims to have a knack for punting controversial topics to comp.lang.python to hash out, not realizing which topics are weapons of mass destruction. The ternary operator proposal (akin to C/Perl's ?: operator) turned out to be even more explosive than last year's boolean-type proposal. Two public votes proved inconclusive beyond the fact that 50% of the respondents want a ternary operator and 50% don't. Guido is leaning toward the following syntax if he decides to implement it:

result = if C1: x1 elif C2: x2 else: y

This syntax got slightly more votes in both elections than the other proposals:

 
result = C ? x : y            
    # Pro: like C/Perl.  Con: symbols are unpythonic. 
result = if C then x else y   
    # Con: expression form has 'then' but statement form doesn't.  
result = x if c else y        
    # Pro: like list comprehensions.  Con: order of expressions is confusing.

Guido then went over Tim Peter's list of 20 "Zen of Python" items: "Beautiful is better than ugly. Explicit is better than implicit. Readability counts. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess." Two maxims deserve special mention:

  • "There should be one--and preferably only one--obvious way to do it." This is the most-violated rule because sometimes you come up with a better way later, and sometimes the obvious way is not always obvious. That leads to the following rule.

  • "...although that way may not be obvious unless you're Dutch." Guido used some obscure reference to Holland's Christian past to explain this. For an example, he said it's obvious (if you're Dutch) that a tuple is not an immutable list. A list is a collection of things that have something in common (whether they are the same type or not). Usually you don't know ahead of time how many there will be. A tuple is a group of unrelated things that want to stay together. Even if Python had immutable lists, it would have tuples too.

Guido proposed one addition to the Zen: "Be kind on Usenet, some posters are only eleven years old."

Where does Guido want to take Python? He wants to reduce feature duplication: the string module, xrange, classic classes, short integers, 8-bit strings (to be replaced by Unicode for strings and by a mutable byte array for binary data), map and filter, lambda, != vs <>. He wants to do some efficiency fixes on the compiler and support native compilation to machine code. The jury is still out on whether interfaces will be added to the language. He'd like to release Python 2.3 on July 4 if possible.

On the last day of the conference, Guido gave a lightning talk titled "Python Regrets". He describes features he wishes he hadn't added to Python. See the PDF slides he made for OSCon 2002 for the complete list. Here are a few highlights:

  • .__int__() is ambiguous. It's used both for converting to an integer (truncating) and for a user integer object to specify its integer value (non-truncating). These should be separate methods.

  • myDict.has_key(key) was borrowed from ABC, but key in myDict is better.

  • Relative imports were a bad idea.

  • Tabs should not have been allowed for indentation. Or at the very least, mixed tabs and spaces should not have been allowed.

  • Using backslashes \ for line continuation is ugly. Use extra () or something else instead.

  • print and myFile.write(text) are too dissimilar. There should be a convenient but consistent way to print to the console.

  • input() is both dangerous and hard to teach. It's dangerous because wrong input comes in as who knows what type, raises an exception or causes side effects. It's hard to teach because the EOF behavior is different from sys.stdin.readline().

Conclusion

The first PyCon was a success. People liked the relatively loose schedule that allowed time for private and spontaneous meetings. They were enthusiastic about the new ideas: open space and the sprints. The only complaints I heard were of a "we'll do better next time" variety: the food was the same every day; there was too much food (Guido's picture) considering some registrants had to leave early and others don't show up at all; the registration system for on-site registrants didn't work right; the last day was a little slow; the 15-minute refereed paper-sessions were too short. But none of these were show stoppers, and it was an impressive achievement for Python's first amateur-run conference. More than one person commented that this will likely become Python's "working" conference, the one where development gets done or planned, while others, such as OSCon, will serve mainly to promote Python to those going to those conferences anyway.

A final note to Pythoneers: the theme of PyConDC 2003 was Popularizing Python, and that's also a goal of the Python Software Foundation this year. Now you have learned. Go forth and do something with it.

email: mso@oz.net

Load Disqus comments