Scientific Graphing in Python

In my last few articles, I looked at several different Python modules that are useful for doing computations. But, what tools are available to help you analyze the results from those computations? Although you could do some statistical analysis, sometimes the best tool is a graphical representation of the results. The human mind is extremely good at spotting patterns and seeing trends in visual information. To this end, the standard Python module for this type of work is matplotlib. With matplotlib, you can create complex graphics of your data to help you discover relations.

You always can install matplotlib from source; however, it's easier to install it from your distribution's package manager. For example, in Debian-based distributions, you would install it with this:

sudo apt-get install python-matplotlib

The python-matplotlib-doc package also includes extra documentation for matplotlib.

Like other large Python modules, matplotlib is broken down into several sub-modules. Let's start with pyplot. This sub-module contains most of the functions you will want to use to graph your data. Because of the long names involved, you likely will want to import it as something shorter. In the following examples, I'm using:

import matplotlib.pyplot as plt

The underlying design of matplotlib is modeled on the graphics module for the R statistical software package. The graphical functions are broken down into two broad categories: high-level functions and low-level functions. These functions don't work directly with your screen. All of the graphic generation and manipulation happens via an abstract graphical display device. This means the functions behave the same way, and all of the display details are handled by the graphics device. These graphics devices may represent display screens, printers or even file storage formats. The general work flow is to do all of your drawing in memory on the abstract graphics device. You then push the final image out to the physical device in one go.

The simplest example is to plot a series of numbers stored as a list. The code looks like this:


The first command plots the data stored in the given list in a regular scatterplot. If you have a single list of values, they are assumed to be the y-values, with the list index giving the x-values. Because you did not set up a specific graphics device, matplotlib assumes a default device mapped to whatever physical display you are using. After executing the first line, you won't see anything on your display. To see something, you need to execute the second show() command. This pushes the graphics data out to the physical display (Figure 1). You should notice that there are several control buttons along the bottom of the window, allowing you to do things like save the image to a file. You also will notice that the graph you generated is rather plain. You can add labels with these commands:

plt.ylabel('Power Level')

Figure 1. A basic scatterplot window includes controls on the bottom of the pane.

You then get a graph with a bit more context (Figure 2). You can add a title for your plot with the title() command, and the plot command is even more versatile than that. You can change the plot graphic being used, along with the color. For example, you can make green triangles by adding g^ or blue circles with bo. If you want more than one plot in a single window, you simply add them as extra options to plot(). So, you could plot squares and cubes on the same plot with something like this:

t = [1.0,2.0,3.0,4.0]

Figure 2. You can add labels with the xlabel and ylabel functions.

Now you should see both sets of data in the new plot window (Figure 3). If you import the numpy module and use arrays, you can simplify the plot command to:


Figure 3. You can draw multiple plots with a single command.

What if you want to add some more information to your plot, maybe a text box? You can do that with the text() command, and you can set the location for your text box, along with its contents. For example, you could use:

plt.text(3,3,'This is my plot')

This will put a text area at x=3, y=3. A specialized form of text box is an annotation. This is a text box linked to a specific point of data. You can define the location of the text box with the xytext parameter and the location of the point of interest with the xy parameter. You even can set the details of the arrow connecting the two with the arrowprops parameter. An example may look like this:

plt.annotate('Max value', xy=(2, 1), xytext=(3, 1.5), 
 ↪arrowprops=dict(facecolor='black', shrink=0.05),)

Several other high-level plotting commands are available. The bar() command lets you draw a barplot of your data. You can change the width, height and colors with various input parameters. You even can add in error bars with the xerr and yerr parameters. Similarly, you can draw a horizontal bar plot with the barh() command. Or, you can draw box and whisker plots with the boxplot() command. You can create plain contour plots with the contour() command. If you want filled-in contour plots, use contourf(). The hist() command will draw a histogram, with options to control items like the bin size. There is even a command called xkcd() that sets a number of parameters so all of the subsequent drawings will be in the same style as the xkcd comics.

Sometimes, you may want to be able to interact with your graphics. matplotlib needs to interact with several different toolkits, like GTK or Qt. But, you don't want to have to write code for every possible toolkit. The pyplot sub-module includes the ability to add event handlers in a GUI-agnostic way. The FigureCanvasBase class contains a function called mpl_connect(), which you can use to connect some callback function to an event. For example, say you have a function called onClick(). You can attach it to the button press event with this command:

fig = plt.figure()
cid = fig.canvas.mpl_connect('button_press_event', onClick)

Now when your plot gets a mouse click, it will fire your callback function. It returns a connection ID, stored in the variable cid in this example, that you can use to work with this callback function. When you are done with the interaction, disconnect the callback function with:


If you just need to do basic interaction, you can use the ginput() command. It will listen for a set amount of time and return a list of all of the clicks that happen on your plot. You then can process those clicks and do some kind of interactive work.

The last thing I want to cover here is animation. matplotlib includes a sub-module called animation that provides all the functionality that you need to generate MPEG videos of your data. These movies can be made up of frames of various file formats, including PNG, JPEG or TIFF. There is a base class, called Animation, that you can subclass and add extra functionality. If you aren't interested in doing too much work, there are included subclasses. One of them, FuncAnimation, can generate an animation by repeatedly applying a given function and generating the frames of your animation. Several other low-level functions are available to control creating, encoding and writing movie files. You should have all the control you require to generate any movie files you may need.

Now that you have matplotlib under your belt, you can generate some really stunning visuals for your latest paper. Also, you will be able to find new and interesting relationships by graphing them. So, go check your data and see what might be hidden there.

Load Disqus comments