New Projects - Fresh from the Labs

SOFA—Statistics Open For All (

If statistics is your game, and you're chasing an easy-to-use and comprehensive package that outputs great-looking charts, look no further. According to the Web site: “SOFA is a user-friendly statistics, analysis and reporting program. It is free, with an emphasis on ease of use, learn as you go and beautiful output. SOFA lets you display results in an attractive format ready to share.”

SOFA Statistics provides a highly flexible visualization system for analyzing complex data.

A montage of some of SOFA's beautiful graphs and charts, generated on the fly.

My favorite feature of SOFA's is its ability to generate HTML pages of your work dynamically, which can be viewed by anyone with a browser.


Binary packages are available for Linux, Windows and Mac (with Linux at the top of the list). Sadly, the Linux binary is only for Ubuntu, but the obligatory source also is available. Ubuntu users can grab the .deb and work things out for themselves, but the source is a bit trickier. At the time of this writing, the installation process was in a state of flux, so project maintainer Grant Paton-Simpson will have some special instructions up at the Web site for LJ readers when this article is printed.

As far as library requirements, here's what Grant told me you need:

  • python (>= 2.6.2).

  • wx-common (>=

  • python-wxversion (>=

  • python-wxgtk2.8 (>=

  • python-numpy (>= 1:1.2.1).

  • python-pysqlite2 (>= 1.0.1).

  • python-mysqldb (>= 1.2.2).

  • python-pygresql (>= 1:4.0).

  • python-matplotlib (>=

  • python-webkit (>= 1.0.0).

Once the program is installed, you should be able to find SOFA Statistics in your menu; otherwise, you'll need to run it from a terminal. If you need to use the command line, enter:

$ python /usr/share/pyshared/sofa/

This path may be different on some distributions, and Grant may have made a link to a bin directory by the time this article is published (meaning you could start SOFA Statistics with a simple one-word command).


Grant has gone to a lot of effort making some excellent video tutorials, and there's no way I can improve upon them, so instead, I concentrate on highlighting cool features here. Again, Grant appears to be one step ahead of me in that he's provided a default set of preloaded values you can use to explore the project with ease, rather than going through the laborious process of first having to learn how to enter data and then making it display something meaningful. For now, let's look at the three main sections: Report Tables, Charts and Statistics.

Under Report Tables, choose some random settings under Table Type, provide names for Title and Subtitle, and choose some of the available data fields with the Add button. Now click Run, and a swank new table is presented to you. Don't like the aesthetics? No problem. The Style output using... drop-down box lets you change the border to something more pleasing—a nice touch.

The pièce de résistance is probably the Charts section. This is where you can play around with the charts you see here in the screenshots, and more and more chart types are being added over time. Whether you want a bar graph, pie chart, line graph or something like a Scatterplot configuration, chances are it's doable. Play with some values in the Variables section, choose a Chart Type, click Run and a beautiful chart appears.

The Statistics section is where the elegance of design and data flow really come into play. This section is a bit beyond me, but here you can run statistical tests on your data, with a focus on the kind of tests most users need, most of the time. You can choose from common tests, such as ANOVA or Chi Square, or run through a check list of choices to choose what's right for you. Click Configure Test on the right, and you'll be presented with the final screen.

From here, you can choose which variables and groupings you want to test against. And finally, click Run. This section gives you the most impressive of the readouts and provides a comprehensive bundle of tables and graphs of analyzed statistics.

However, one of the most impressive and practical features under all three of these main sections is the Send output to... feature, with its View button. Here you actually can view each page of output in any Web browser in HTML format. This gives the project some instant credibility and practicality in that any work you do in SOFA can be opened instantly by anyone (like your coworkers) on their own computers, without needing to install SOFA Statistics. Plus, the information they see will be presented professionally with some impressive graphics to boot.

Although SOFA Statistics is still in its slightly buggy developmental stage, project maintainer Grant Paton-Simpson has shown an impressive grasp of what needs to be included in SOFA, from the small touches to the big. My hope is that this program becomes an adopted industry standard of sorts, mentioned in everyday conversation by organization workers the world over. And, given its free and multiplatform nature, combined with a very canny coder and designer, this hope of mine may not be an unrealistic one.


John Knight is the New Projects columnist for Linux Journal.