The Past, Present and Future of GIS: PostGIS 2.0 Is Here!
Extend PostgreSQL's capabilities with PostGIS 2.0 and discover all the magic of spatial databases.
Even if you're unfamiliar with GIS, I am pretty sure you know what Web mapping is. GIS stands for geographical information systems, and it originated in the early 1970s as a set of tools and techniques for scientists (cartographers, land planners and biologists). Since then, the field has been experiencing an amazing evolution, as in many other computer-related fields. One of the most revolutionary things is that now maps, and especially Web mapping, are a common experience for millions of people in everyday life. Not only in the past few years have we seen people using more and more mapping apps, there has been an explosion in personal Web mapping. Today, a lot of blogs and personal Web sites have maps.
What Is PostGIS?
So, what's special with spatial data? Not really very much—a lot of data has location references (think of your address book as a trivial example), but the spatial component is not really organized. When you want to organize your spatial data, you need to do it with the proper tools.
Spatial data, as all other data types, needs to be stored somewhere. An RDBMS is a great tool for storing, processing and analyzing huge amounts of data, but you will need an RDBMS with a spatial extension if you are going to go this route. Do you know a great open-source RDBMS? I bet you do. Many of us commonly use MySQL in Web applications, but when it comes to spatial data, it's not the first choice. Your friend when it comes to spatial data is PostGIS, an amazing companion of PostgreSQL.
I'm sure you've heard of PostgreSQL. It's probably the most famous open-source RDBMS, and LJ has covered it often in the past. If you're not familiar with it though, check out Reuven M. Lerner's "PostgreSQL 9.0" in the April 2011 issue of LJ (http://www.linuxjournal.com/article/10986).
PostGIS is not a new project. It started in 2001 and reached maturity at release 1.0 in 2006. On April 3, 2012, 2.0 was released. Version 2.0 is a major shift, and it indeed broke backward compatibility. PostGIS developers were forced to cause this break because of a new serialization (see Resources). On June 22, 2012, version 2.0.1 was released, a bug-fixing release, and this is the latest release at the time of this writing.
Whether or not you have PostgreSQL installed on your Linux box, getting PostGIS up and running is really simple. You can download the source code and compile it yourself, which isn't hard, but it's not really necessary for a first look at PostGIS. If you love compiling, take a look at the reference material—the official documentation is very detailed and complete. There also are lots of blog posts from the community about custom installations.
When you have no specific requirements, the easy way often is the best. You
can use the package delivered by your Linux distribution (for example, type
install postgresql-9.1-postgis for Debian distributions).
However, as with other
rapidly evolving software, you are not going to find the latest release.
A binary prepared by EnterpriseDB may come in handy if you want the bleeding-edge version. Installation is really straightforward, and it also includes Stack Builder, a utility to add tools and upgrade your installation with future releases.
Being an extension of PostgreSQL, you may wonder what PostGIS adds to the many functions shipped with PostgreSQL. In a nutshell, it extends storage, retrieval and analysis capabilities of spatial objects. Let's look at an example to better explain how it works. You know an RDBMS can answer questions like "How many employers are currently on holiday in each department?". The standard way to ask it with PostgreSQL is by speaking SQL:
SELECT COUNT(E.SERIAL) AS #, D.NAME FROM EMPLOYERS E ↪JOIN DEPARTMENT D ON (DEP_ID) WHERE E.ON_HOLYDAY = 1 ↪GROUP BY D.NAME ORDER BY D.NAME
What if your question has a spatial component? Suppose you want know how many houses are within 3 kilometers from the new highway path in your county. Standard SQL has no features to express this, but here comes PostGIS to help perform the analysis:
SELECT COUNT(id) FROM houses WHERE ST_DWithin(geom,(SELECT ↪highway.geom FROM county, highway WHERE ST_Intersects ↪(county.geom, highway.geom) AND county.name = 'Orange' ↪AND highway.name = 'Interstate 5'),3000);
Does it seem powerful? Indeed it is! The code fragment above should give
you some hints about what PostGIS provides—a huge set of special functions,
ST_ for querying and processing, plus
two new data types
called geometry and geography.
Of course, geometry and geography are the data types for spatial features. They are quite similar. Both let you store simple geometrical objects in a table. The big difference is that geography accepts geodetic coordinates (that is, expressed in degrees on a spherical reference system), while geometry accepts coordinates defined over a planar reference system. Geography was introduced in PostGIS with release 1.5.0, and due to underlying complex math, only a few functions support it.
The simple features I'm talking about are points, lines and polygons. With them, you can model the true world. Indeed, this is a standard approach—the simple features' properties and behaviors were modeled by the Open Geospatial Consortium (OGC, an organization committed to defining open standards for GIS and data interoperability), and PostGIS, since its early versions, was built with a strong support for that standard.
Adding geometry support to a table is really simple. Suppose you are building a table of world capitals, you would start with basic properties:
CREATE TABLE capitals ( id SERIAL, state_name TEXT, capital_name TEXT, population numeric(8,0), PRIMARY KEY(id) );
If you are going to store features that can be represented on a map,
you need to add a spatial reference. Point geometry may be a good
AddGeometryColumn is the function you need:
SELECT AddGeometryColumn('gisuser', ↪'capitals','geom',4326,'POINT',2);
Here, you passed values for schema, table name, geometry column name, spatial
reference system and geometry type. The last value means you want a
two-dimensional geometry (that is, a point defined on a surface). If you are going
to store elevation, you can set three as the dimension value. And there's
PostGIS also supports four-dimensional geometry. Well, the fourth dimension is
not for travel trips, but it is useful to associate a measure to the geometry,
and the fourth dimension is indeed called M. For example, a stream network may
be modeled as a multilinestring value with the M coordinate values measuring
the distance from the mouth of stream. The method
ST_LocateBetween may be
used to find all the parts of the stream that are between, for example,
10 and 12 kilometers from the mouth.
Before using your table, it is better to create an index on the geometry column. The syntax is equivalent to any other index creation; the index type is GiST (Generalized Search Tree) somewhat similar to an R-Tree index:
CREATE INDEX capitals_geom_gist ON capitals USING gist (geom);
Now let's add real data to the table. How do you insert
values in the geometry column? The
ST_GeomFromText function translates numeric
values for you. So let's insert the coordinates you picked up in London
when you were watching the Olympic games:
INSERT INTO capitals (state_name, capital_name, population, geom) ↪values('UK','London', 6500000, ↪ST_GeomFromText('POINT(-0.01639, 51.53861)', 4326));
The text you are passing to the function is called a Well-Known Text (WKT) representation of spatial objects. Points are really simple to define, but how do you express a line or a polygon? You could mimic the capitals table definition to create a rivers table and add a record for the Thames:
ST_GeomFromText('LINESTRING(0.31221 51.47033, 0.33477 51.45171, ↪0.44437 51.45851, 0.45877 51.48934, 0.61523 51.49512)',4326)
Another table could contain famous buildings represented by polygons. You can find Westminster Abbey here:
ST_GeomFromText('POLYGON((-0.12850 51.49963, -0.12856 51.49929, ↪-0.12814 51.49927, -0.12822 51.49896, -0.12722 51.49890, ↪-0.12714 51.49919, -0.12627 51.49933, -0.12711 51.49957, ↪-0.12707 51.49971, -0.12751 51.49974, -0.12758 51.49956, ↪-0.12850 51.49963),(-0.12810 51.49902, -0.12805 51.49924, ↪-0.12757 51.49921, -0.12761 51.49897, -0.12810 51.49902))',4326)
The WKT for the polygon contains two coordinate lists enclosed in round parentheses, while lines always are defined by a single list. Indeed, a polygon may contain holes. The first list defines the external ring of the polygon while the following lists, you can have as many as you need, define internal rings that encircle holes.
Stefano Iacovella is a longtime GIS developer and consultant. He strongly believes in open source and constantly tries to spread the word, not only in the GIS sector. When not playing with polygons and linestrings, he loves reading travel books, riding