Import This: the Tenth International Python Conference
One of the most interesting talks occurred in the Python Tools Track, "The Parrot Project: Building a Multi-Language Interpreter Engine" by Dan Sugalski. Parrot originated as an April Fool's joke concocted by Guido and Larry Wall. (What do you expect from somebody who names a language after Monty Python? Parrot is named after The Dead Parrot Sketch. O'Reilly even put up a pseudo catalog entry for the nonexistent Nutshell book.)
But then a funny thing happened on the way to the comedy club. The Perl6 team, charged with making an improved bytecode interpreter for Perl, realized that it would take just a little more work to make the interpreter run Python, Ruby, Objective-C and other languages, too. For Python, this means a third implementation of the language, alongside the standard Python (called CPython because it's written in C) and Jython (a version of Python written in Java). Guido said in his Developer's Day talk that he's not ready to give up on the CPython codebase for Parrot, so CPython will remain the standard and Parrot will be an alternative.
It turns out that most of the visible differences between programming languages--the stuff that religious wars are made of--are really only relevant at the parser level. Once the parser has tokenized the source code, what's left are "variables", "expressions", "functions", "for loops", etc., things that are common in every modern language. Perl, in its attempt to be the kitchen-sink borg language that assimilates everything, in the same way that Emacs is the Editor to End All Editors, has an interpreter that performs a superset of what all other languages need. Anything that another language needs that Perl lacks can more or less easily be added to the interpreter without bothering Perl.
What does Parrot offer a language? A ready-made back-end interpreter that provides OS independence, a rich set of data types, dynamically changeable types (which makes classic optimizations à la C difficult), closures, continuations, matrix math, curried functions and garbage collection. It provides a safe execution environment (resource quotas, access restrictions) for untrusted eval'd code.
Another gee-whiz feature is that you can parse Python source code to Python bytecode, convert that to Perl bytecode and then unparse it to Perl source code. I assume Parrot will come with a command-line utility to take care of the details for you.
Parrot's design goals are to:
run Perl code fast.
clean up the grotty bits of the existing interpreters (which come "preinstalled on your system for your inconvenience").
provide a good base for Perl's language features .
be easily extensible and embeddable. (Perl's binary API is so horrid he never wants to use it again.)
have a long-range scalable design he won't be embarrassed about ten years from now. ("Often software lasts longer than it should.")
The secret, says Dan, is that "Python, Perl, Objective-C and Ruby are all just ALGOL with a funny hat." They all have object models that are "the same except where they're different". The differences are minor, and anything missing in the hardware or interpreter can be emulated in the runtime library at the price of speed.
Parrot assumes modern hardware with good-sized L1 and L2 caches and lots of RAM. The interpreter tries to build long "pipelines" of machine instructions, so that if an unpredicted decision is made that blows the pipeline and it has to go back to main memory, it does it in a big way that minimizes the need to do it again for a while.
Parrot is register-based rather than stack-based. This performs no worse than stack-based systems on register-starved architectures like the x86 but much better than stack-based systems on other hardware. "If you don't like registers, pretend it's a large-named temp cache."
Parrot's native data types include Integer, Float, Parrot String and Parrot Magic Cookies (PMC). PMCs are a generic object type. However, language implementors are encouraged to use only PMC for all their types--even numbers--because Parrot's other three types are really only optimizing shortcuts for the interpreter and do not necessarily have the full behavior needed by your language's types.
Dead objects are detected and garbage collection is done separately. Dan claims garbage collection is very difficult to get right and most languages do a bad job of it. That's why he recommends languages use his garbage collector rather than their own. An audience member argued that Python's reference-counting scheme is more portable, but Dan stuck to his claim that reference counting sucks. However, Parrot does expose an interface to allow languages to do pseudo reference counting for those occasions where it's useful. C extension programmers that create Parrot objects do have to register them with the garbage collector if they hold onto the object past the lifetime of the function that created it. (If the object is returned from the function or is just discarded at the end of the function, the programmer does not have to register it.)
Language parsers can be pushed in and out at runtime. For instance, your base parser may be for Python source code. Then you encounter a regular expression, so you push a regex parser. Then, a while later, you encounter a database query string, so you push a SQL parser.
Parrot's basic operations are done, and the interpreter is Turing complete. The parser isn't done yet, but an external compiler written in Python generates Python Parrot bytecode for a subset of Python source constructs.