SciPY for Scientists

In my last article, I looked at NumPY and some of its uses in numerical simulations. Although NumPY does provide some really robust building blocks, it is a bit lacking in more sophisticated tools. SciPY is one of the many Python modules that build on NumPY's. In fact, SciPY has become sort of the de facto science package in Python programming. If you have a scientific problem you are trying to solve, you could do worse than starting with SciPY. Not only are there more advanced functions and objects available to do linear algebra, but there also are functions and objects to handle calculus, interpolation, signal processing and Fast Fourier Transforms, among others. So many functions are available, they actually are grouped together into sub-packages. In this article, I take a quick look at what sorts of functions are available and how to use them to get some serious work done.

To start, you need to import the main scipy module. You would do this with the usual:


import scipy

This imports the common set of functions and objects used in SciPY. It also imports the most-used parts of NumPY, because they are so fundamental to the work for which SciPY is used. If you need anything else from NumPY, you need to import the NumPY module explicitly. In many cases, that is something you will want to do anyway. All of the extra functions in the individual sub-packages need to be imported explicitly. So, if you want to do some signal processing, you would need to use this:


from scipy import signal

The simplest package in SciPY probably is the constants sub-package. This package provides a basic set of physical constants that are most used, like pi or Avogadro's number. It also includes a much larger set of constants from the 2010 CODATA database. These physical constants are stored as a tuple of value, unit and uncertainty, and they include items as diverse as the alpha particle mass to the Wien wavelength displacement law constant. The scipy.misc sub-package contains all of those bits and pieces that don't really fit anywhere else. Here, you can find functions like factorial (to calculate the factorial of a number) and imread (to read an image file into Python).

Linear algebra is one of the heavy uses of computational code. SciPY includes a sub-package called linalg, which is a wrapper for the package linalg within NumPY. All of the functionality from NumPY is included in scipy.linalg, along with several other functions. In the NumPY module, these linear algebra functions may or may not be handled by external libraries, depending on how NumPY was compiled. With SciPY, this is no longer an option. It needs to be compiled with the ATLAS LAPACK and BLAS libraries to handle the actual numerical work in an optimized fashion. There are functions to handle things like finding an inverse, determinant or transpose of a matrix. If you need to solve a system of equations, you can do so with a single function call. If you start with a coefficient matrix, A, and a right-hand side vector, b, you can find the solution vector for your system with:


from scipy import linalg
linalg.solve(A,b)

In many physics and engineering problems, you need to find eigenvalues and eigenvectors. The linalg sub-package provides very fast functions for doing that as well.

Most people default to using R to do statistics, but you don't have to. SciPY includes a stats sub-package that provides many of the functions you will need in the majority of cases. The describe function will give you the basic statistical description of a vector of samples. This includes the mean, variance, skew and kurtosis. Once you have some basic statistics, you probably will want to run a t-test to see how well your data matches your model. You can do this with something like:


stats.ttest_1samp(x, m)

where x is your data and m is your model. This will give you a t-statistic and a p-value. Just as in R, there are many more complicated statistical functions available to you.

A topic near and dear to my heart is solving differential equations. SciPY can help with that task too. The sub-package you need is named integrate. There are two sets of functions, one that takes a function object as the input and one that takes a set of fixed samples. You can do single, double and triple integrations on a function object with the functions quad, dblquad and tplquad. If you have data from some experiment, you integrate it with the trapezoidal rule, Simpson's rule or Romberg Integration. If you are working with ordinary differential equations, some special functions are available. The function odeint will solve a set of ordinary differential equations with a given set of initial conditions.

Last, but not least, let's look at the weave sub-package. Even though SciPY already is full-featured, it can't cover every eventuality. Although you always can write the code in pure Python for whatever piece is missing, sometimes you need to squeeze every last cycle out of your hardware. In those cases, you probably want to write some optimized C code to do the heavy lifting. Although you could write this and compile it as an external object file, that is far too much work for any self-respecting programmer. Enter the weave sub-package.

With weave, you can add C code from within your Python program in a number of ways. The most direct is the inline function. With this, you can write out your C or C++ code, compile it and run it directly within your Python program. All of your Python objects are available within the scope of your inlined code. The contents of any mutable objects are changeable from within your C/C++ code. If you want to return results to your Python program, these are available in a special variable called "return_val". A trivial example, from the SciPY documentation, uses printf to show how the inline function works:


import weave
a = 1
weave.inline('printf("%d\\n",a);',['a'])

The general form for the inline function is a string containing the code to compile and run, and a list of the Python variables to make available to the C/C++ code. If you have a larger fragment of code you want to inline, you can use triple quotes to define a code block and save it to a variable first. For example, you may have something like:


code = """
   for (int i=0; i<a; i++) {
      printf("%d\\n", i);
   }"""
weave.inline(code, ['a'])

Another way to speed up your code is to let Python do it for you with the blitz function. In this case, blitz takes some NumPY expression and creates C++ code and compiles it to an external module. The first time you do this, it may take several minutes to generate the code and compile it. Once this is done, the compiled object file is stored to be reused the next time it is called. Now you can see a speedup of 2–10 over just straight Python code. It is also saved after Python closes, so you can reuse it the next time you run your Python code.

Now you have some tools available to do some real scientific computations. In my next article, I'll look at matplotlib, one of the ways available to visualize all of this computational work you have been doing. Until then, get some science done.

Load Disqus comments