Getting Started With Quantum GIS

James Gray continues his review of desktop Geographic Information System (GIS) applications with an introduction to the concepts needed to get started with the user-friendly, open-source Quantum GIS.

Going Deeper into GIS: an Introduction to QGIS
If you've tinkered around with GoogleEarth, you know how much fun it can be to work with geospatial data. Whenever I need a good diversion, I fire up GoogleEarth to float above the skyscrapers of Manhattan or to see if they've built anything new in the neighborhood where I grew up.

To take things a step further and gain a new level of control with geospatial information - where you're the chef who concocts the whole stew - dive into GIS, or geographic information systems. Not only will a GIS make you feel like you have the whole world in your hands, but you will probably be able to do something extremely useful with it for your work or private life.

Incidentally, though I will discuss many basic concepts, if you are new to GIS, you may want to read my earlier, broader introduction to GIS, which can be found here.

The purpose of the article at hand is to give newbies to GIS some useful background before getting started with Quantum GIS, or QGIS, one of the more advanced, comprehensive, user-friendly desktop GIS system in the open-source world. While QGIS has an excellent documentation, those who are new to GIS might find the terminology a bit stilted and missing some information. This is because the authors of the documentation assume that you are already familiar with GIS, and you're coming to QGIS to find an alternative. I, on the other hand, will assume that you have never used a GIS before.

Installing QGIS
QGIS has a useful, comprehensive Web site with plenty of resources for you to chew on, which you'll find here. You'll find a wiki, help forums, loads of documentation and, of course, the application itself. The download page offers versions for Mac OS X, Windows, and several options for Linux users: source version, Debian, Ubuntu Gutsy and openSUSE.

Just so you know you're doing it right, the QGIS GUI should like like this

Before we both do any damage, though, let's review some of the concepts we need to use QGIS with confidence.

What Do I Do With a GIS?
Whenever I tell a lay person what GIS is, I generally tell them it involves "mapping with a computer". While this description is a bit oversimplistic, it captures the broad purpose of GIS. Here is how the QGIS folks describe GIS:

A [GIS] is a collection of software that allows you to create, query and analyse geospatial data. Geospatial data refers to information about the geographic location of an entity. This often involves the use of a geographic coordinate, like a latitude or longitude value. Spatial data is another commonly used term, as are: geographic data, GIS data, map data, location data, coordinate data and spatial geometry data.

I would further distill the above by saying that, by applying the power of the computer, one can use a GIS to pull in any kind of geographic information and then find relationships among that information, and display it how you wish.

This geographic information we are working with in a GIS consists of two elements, namely spatial features and attribute data. Examples of spatial features might include streets, rivers or land cover. In other words, any feature you might find on a map. Meanwhile, attribute data describes the characteristics of the spatial features and is stored in a database within the GIS. For example, most of those streets have names and lengths; and the land-cover types have names and areas associated with them. In the land-use case, a GIS might store categories such as high-density urban, low-density urban, cropland, forest, etc., which one could then query easily.

Now that we know what a GIS can do, we should spend some time explore how a GIS works with data.

Peel Back the Layers
Your road map would think you were completely loco if you commanded it to "just show me the rivers and mountains, please", or "hey you, map, flip the county boundaries on and off." On the other hand, because a GIS portrays data in similar groupings of geographic elements, called layers, your computer will follow your command and not label you 'loopy'. Some examples of layers are countries, cities, rivers, specific buildings, and oceans. A GIS allows you to control which layers are displayed on your screen at any time.

Layers can consist of two types, namely features and surfaces. In our above list, the layers with countries, cities, rivers and specific buildings are feature-based; oceans are one single, continuous expanse, and are thus a surface.

How a GIS Does Data: Vector vs. Raster
The hefty challenge for a GIS is to portray our lovely yet complex world accurately yet rapidly - and without the need for a supercomputer! There are two 'tricks', or methods, a GIS uses to bring the create a digital representation of Earth's features to your desktop.

The first method is using vector data. As complicated as the world can be, a GIS can represent any geographical object using three geometric elements - i.e. points, lines and polygons. Small stuff like community centers and traffic lights can be portrayed as points. Things like rivers and pipelines are really just glorified lines, so they can be shown as such. Finally, nearly everything else, such as a state park, though it might be oddly shaped, is finite and contained in boundaries, making it a polygon at the end of the day. Broadly speaking, then, the vector format is analogous to traditional maps, where the world is abstracted with symbology, and precision is very important.

The second method is raster data. Raster data is used to visually portray Earth's characteristics that have no shape, including including measurements like ocean depth, forest cover type, elevation and annual rainfall. A raster is a matrix of same-sized square cells, each representing a unit of surface area on the planet, e.g. 100 square meters. For instance, we could go back to the ocean we discussed earlier as a surface. Each cell contains a depth value, which can in turn be displayed logically with varying color values - e.g. the deeper the water, the darker the color. Some image types you will encounter include GeoTIFFs, Erdas Imagine Images, GRASS AIGs and USGS Digital Elevation Models

Some common types of raster-based imagery are satellite images and aerial photos. In these two types of raster imagery, the value of each cell is a measurement of light that is reflected off of the Earth's surface. Particular ranges of these values can signify specific land-cover or vegetation types.

Spatial Data Formats in GIS
As you splash around in the world of GIS, you will also encounter a plethora of spatial file formats. If you have ever used the application ArcGIS from ESRI, you are probably familiar with geodatabases and coverages, two of the most common spatial file formats in proprietary GIS. These two formats allow one to store more than one feature class - i.e. groups of points, lines or polygons - in a single file. The geodatabase is the newer, more advanced of the two formats, enabling more advanced relationships among the different feature classes by using a so-called "spatial reference system". A geodatabase also builds on the existing features of relational databases.

Of these two more advanced spatial data formats only coverages are usable in QGIS but not geodatabases. In addition, in QGIS, we can utilize ESRI shapefile, which are plentiful in online data repositories and a sort of 'standard' since they have been around a long time. In fact, shapefiles are the standard format for ESRI's ArcView, which is the company's previous generation of GIS applications. Essentially a shapefile is a set of files with vector-based location and attribute data, which can be represented in a GIS application.

QGIS also supports some other file formats, such as MapInfo and PostGIS. More on those in future articles.

Some Hardcore Cartography: Map Projections and Coordinate Systems
Two other important concepts, which are as critical to any cartographic endeavor as to GIS, are map projections and coordinate systems.

Remember the big, flat world map you had in your 4th grade classroom? The one with Greenland bigger than Africa? That map is an ideal illustration of what happens when you depict a round object such as the earth onto a flat map. Converting a 3-D globe onto a 2-D map is called a map projection.

In a GIS, you need to consider the projection because any map you view or create is essentially flat like a paper map. Thus map concept applies to both situations!

In a map projection, you cannot avoid distortion with regard to depicting spatial data. You have to live with distortions in a map's shape, area, distance or direction. (So sorry to break the news!) Reducing distortion in some of the properties will only increase it in others. Our flat wall map with Greenland on steroids illustrates this conundrum nicely. That classic map probably utilized the famous and ubiquitous Mercator projection, which maintains accurate direction but clearly blows it regarding area up near both poles where the lines of latitiude stretch upwards like Silly Putty. At the same time, take note that near the equator, the area-related distortion is less pronounced. This observation gives weight to the important idea that the projection you choose greatly affects the accuracy of your analysis in a GIS. In future articles, we will discuss how to choose the best projection for your needs given your geographic focus and what kinds of analysis you will perform.

Just as important as the map projection is the coordinate system. A coordinate system is the Cartesian system of x- and y-axes that a GIS uses to define locations on a map. This is opposed to the latitude and longitude system that defines location on a sphere.

In order to be able to define points on the earth with a GIS, you must first determine what your earth looks like. The cartographic geeks call the earth's shape an "oblate spheroid", meaning it is flatter on the top and bottom (at the poles) and bulging around the middle. Besides this general shape, there are other bumps and bulges to consider.

To the rescue are the various options to choose for the spheroid and datum of your GIS project. The spheriod is the model of the shape and size of the earth; some common ones are Clarke 1866 and GRS 80. Meanwhile, a datum is a set of control points, whose locations and geometric relationships have been determined via measurement or calculation. A datum also defines the orientation and origin of the lines of latitude and longitude. The NAD27 datum accompanies the Clarke 1886 spheriod mentioned above; the NAD83 datum goes with the GRS 80 spheroid.

Enough Theory Already, Let's Get Some Vector Data
At this point we have enough GIS 'theory' to know what we're doing and start a project. In this article, I'll show you how to locate some vector data in the form of an ESRI shapefile and get it loaded for display. In subsequent articles, we'll manipulate and query the vector-based data, as well as work with raster data.

The QIS project is kind enough to supply a sample data set, which you can download here. Once you've downloaded the data set, follow these steps to load a shapefile:

1. There are four icons with plus signs. Click on the one furthest to the left that is labeled "Add vector layer".

2. At the dialogue box you will encounter five different shapefiles, namely Alaska, Canada, Lakes, Majrivers and Russia. Let's open the Alaska shapefile by choosing alaska.shp.

Note that each shapefile in turn consists of at least four subfiles with the suffixes .shp (containing features), .sbn (spatial index), .dbf (containing attributes) and .shx (shape index). The shapefiles are missing the projection files, which would be labeled .prj.

This is what your resulting shapefile should look like, a simple outline, i.e. a polygon, of the state of Alaska.

Notice how the shapefile layer shows up on the left side of QGIS, in the area called the "map legend". Meanwhile, the simple outline map of Alaska shows up on the right side in what is called the "map view". You are able to turn the layer, called "alaska" on and off with the checkbox next to its label. The map legend acts like a control center for managing all of your layers. If you want to delete a layer, simply right-click on it and select "remove".

Theory to Practice, Practice, Practice
In this article I covered many of the concepts that underpin any GIS, including the application QGIS. While this information exists in pieces in books and on the Internet, it is not summarized concisely in a single place. Now that we've discussed concepts such as layers, vectors, rasters, file formats (e.g. shapefiles, coverages, GeoTIFFs), projections and coordinate systems in an introductory manner, we are ready to harness the power of QGIS to do some fun and fascinating work with geospatial data. This series on open-source GIS will continue in the coming weeks. Good luck and have fun!

Load Disqus comments