Import This: the Tenth International Python Conference

In-depth coverage of this year's Python conference; who was there, what they said and what they're working on.
Tim Berners-Lee: Webizing Python, Part 1: URI Identifiers

Tim started his talk by confirming his credentials as a Python outsider. "I don't know much about Python", he said, "but Python is fun." Guido had tried hard to convince him to use Python, and he tried hard to convince Guido to do something with the web. Finally, Tim became a Python enthusiast when he tried to learn Python on a plane trip. He had already downloaded Python and its documentation on his laptop, and between takeoff and landing he was able to install Python and learn enough to do something with it, "all on one battery."

Not many people can go to a web browser and type a URL and think, "I invented this baby." So it's natural that Tim would want to webize everything and that his talk would be called "Webizing Python". "Webizing" means replacing all identifiers with URIs. ("URI" is the technical term for that thing you call a URL, although it's wider in scope. See http://www.w3.org/Addressing/.) He also proposed a graph data type to help Python process Web-like data, which is discussed in the next section. Both of these goals are consistent with Tim's vision for the Web, as described in his Short History: "The dream behind the Web is of a common information space in which we communicate by sharing information." Currently, the Web excels at making human-readable information universally accessible. It has not yet, however, reached its potential for delivering machine-readable information (database data) in a similar manner. This is the goal of such efforts as XML, RDF (discussed in the next section) and integrating URIs into programming languages.

What's the easiest way to replace identifiers with URIs? Just do it and see what breaks. Then change the program (Python) to recognize the new identifiers and, again, see what breaks. Why would you want to do this in the first place? A statement such as

import http://www.w3.org/2000/10/swap/llyn.py

gives so much more information than

import llyn
such as who created it and/or is responsible for its maintenance and where to find the latest version for automatic updating. It also facilitates module endorsements, which are like digital certificates: they allow a trusted authority to verify that the module you have is the official version made by some reputable party. (The standard module function urllib.urlopen already opens a URI as if it were a read-only file, honoring redirects transparently.) Using a URI does not mean that Python has to dial your modem every time you run the script; it can call a routine that acts as a proxy server and load the module from the local disk, just like Python does now.

This is all Tim's pie-in-the-sky ideas, not something Python has committed to doing. Making Python (or any language) URI-compatible means overcoming various "closed-world assumptions" in the language, just like making a proprietary database format (or HTML) XML-compatible requires some changes.

What are Python's closed-world assumptions? First, the URI import above won't even compile because ':' and '/' aren't allowed in a module name. We'd either have to extend the identifier syntax, introduce a new quoting mechanism such as <URI>, or make the identifier a quoted string. Using full URIs in expressions would also break, the '/' would be interpreted as the division operator. One possibility is to assume that "." in Python identifiers is "/" in the corresponding URIs.

Of course, you don't want to type the absolute URI every time you use a variable in an expression anyway. You just want a short alias name. Python's "import ... as ..." syntax already does this, and we can teach import to implicitly assign the name llyn to the long URI above.

Of course, local variables would not be linked to URIs, that would be silly. A local variable is private to its enclosing function.

Once we have modules accessible via URIs, module attributes are also accessible individually. module.attribute maps directly to a URI as http://example.com/directory/module.py#attribute. Which brings us to Tim's next topic....

Tim Berners-Lee: Webizing Python, Part 2: the Graph Data Type

The more I tried to write this section, the less I realized I understood graphs, so I recommend you read this section alongside Tim's slides from the talk and W3C's overview of RDF and a gentle What is RDF? introduction for a more complete picture.

A graph is not what you drew in geometry class, but something like a three-dimensional dictionary, or a dictionary of dictionaries, or a list of triples with the first two parts acting as keys, only it's more than all of that combined. The first part is akin to an ordinary dictionary key: a unique identifier for an object, or what database people call a "primary key". The second part represents an attribute of that object. The third part is the value.

Why stop at one level of attributes? Why not recursively go down an arbitrary number of levels? thats.the.python.way.isnt.it? It turns out that one level of properties is exactly what you need to represent items in a database table (row,column,value), a tree of hyperlinks (URI,fragment,content) or Resource Description Framework (RDF) metadata. RDF is an XML-compatible format that allows you to describe the basic properties of web pages (title, author, abstract, publishing date, etc.). That's enough information to build a smart search engine. Current search engines operate on only one criterion--raw document text--because there is no other criteria. But with metadata, they could search on multiple criteria.

So one node in an RDF graph might be:

('http://example.com/dir/subdir/important-document.html', 'author', 'Barney Rubble')

But RDF is not limited to indexing mundane web pages. It can be used for medical information or any other type of database data.

For instance, say we have a graph literal like this:

g = {sky color blue, gray; madeOf air.
     sea color grey.
     grey sameAs grey. }

that defines the following graph:

Row

Column

Value

sky

color

blue

sky

color

gray

sky

madeOf

air

sea

color

grey

grey == gray

  

colour == color

  
Each word is a variable, which may have been initialized from a string literal, a URI or another object. Each node definition ends in a period ("."). Syntax shortcuts (",", ";") allow multiple nodes to be created from one line.

sameAs aliases two variants together, so that a query for one will also match the other. Here this is used to keep Brits happy when they spell "gray" and "color" wrong (kidding). I'm not sure whether Tim intended sameAs to generate an ordinary graph node or a special alias object, so I have shown it specially in the table.

Armed with your graph object, you can run queries like:

g.about[sky]    # Returns a dictionary: 
                # {color: [blue, gray], madeOf: air }

Here, 'color' maps to a list of multiple values (blue, gray). Python dictionaries cannot have duplicate keys, but graphs may have multiple nodes with identical key pairs. The values will have to be returned somehow, and putting them into a list is as good a way as any.

You may also want to query, "Find all the nodes where X is true." This is a variation of the "find everything under a common parent" task. One or more parts can be wildcarded with a "*" symbol, or maybe Python's None object would be better. Here are some possible APIs under consideration:

g.any({sky color *})                   # Returns a list: [blue, gray]
for row, value in g.all({* color *})
g = g + {sky color blue}               # Adds a node to the graph.
g.toxml()                              # Serialize to XML format.

If the graph includes some kind of date or version columns, you could also query, "Is there anything out of date that this node depends on?"

Python itself can benefit from graphs to provide a standardized way to return a wide variety of information that is now handled by multiple ad hoc formats: system information returned by os.system, introspection data returned by the inspect module (e.g., the methods provided by a certain object), an improved DB-API (database API) and a serialization/parsing format for any data.

Another thing we'd need is a visual graph browser to inspect, update, reload and delete nodes.

Somebody in the audience asked whether you really needed to change Python, since one can implement a graph as an ordinary class. Tim said yes, you can use a class, but the reason for building the type into Python itself is to provide a more convenient syntax for constructing graph literals. In that sense, it's similar to the complex, slice and ellipsis types, which are not used in ordinary Python but make Numeric Python more convenient.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Import This: the Tenth International Python Conference

Anonymous's picture

The " top things people say when he mentions Zope" posed some very important questions.

Where should I look or expect follow-up to these?

They seem to be appropriate FAQ's posted at zope.org, etc..

Great Report, Python is easy to learn

Anonymous's picture

Great Report! I sure wish I could have been there! Maybe next year...

A good place to start learning Python is at Python City

Re: Import This: the Tenth International Python Conference

Anonymous's picture

Hey, what about the status of wXPython?

Now that is a nice piece of work!

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix