New Projects - Fresh from the Labs

SOFA—Statistics Open For All (www.sofastatistics.com)

If statistics is your game, and you're chasing an easy-to-use and comprehensive package that outputs great-looking charts, look no further. According to the Web site: “SOFA is a user-friendly statistics, analysis and reporting program. It is free, with an emphasis on ease of use, learn as you go and beautiful output. SOFA lets you display results in an attractive format ready to share.”

SOFA Statistics provides a highly flexible visualization system for analyzing complex data.

A montage of some of SOFA's beautiful graphs and charts, generated on the fly.

My favorite feature of SOFA's is its ability to generate HTML pages of your work dynamically, which can be viewed by anyone with a browser.

Installation

Binary packages are available for Linux, Windows and Mac (with Linux at the top of the list). Sadly, the Linux binary is only for Ubuntu, but the obligatory source also is available. Ubuntu users can grab the .deb and work things out for themselves, but the source is a bit trickier. At the time of this writing, the installation process was in a state of flux, so project maintainer Grant Paton-Simpson will have some special instructions up at the Web site for LJ readers when this article is printed.

As far as library requirements, here's what Grant told me you need:

  • python (>= 2.6.2).

  • wx-common (>= 2.8.9.2).

  • python-wxversion (>= 2.8.9.2).

  • python-wxgtk2.8 (>= 2.8.9.2).

  • python-numpy (>= 1:1.2.1).

  • python-pysqlite2 (>= 1.0.1).

  • python-mysqldb (>= 1.2.2).

  • python-pygresql (>= 1:4.0).

  • python-matplotlib (>= 0.98.5.2).

  • python-webkit (>= 1.0.0).

Once the program is installed, you should be able to find SOFA Statistics in your menu; otherwise, you'll need to run it from a terminal. If you need to use the command line, enter:

$ python /usr/share/pyshared/sofa/start.py

This path may be different on some distributions, and Grant may have made a link to a bin directory by the time this article is published (meaning you could start SOFA Statistics with a simple one-word command).

Usage

Grant has gone to a lot of effort making some excellent video tutorials, and there's no way I can improve upon them, so instead, I concentrate on highlighting cool features here. Again, Grant appears to be one step ahead of me in that he's provided a default set of preloaded values you can use to explore the project with ease, rather than going through the laborious process of first having to learn how to enter data and then making it display something meaningful. For now, let's look at the three main sections: Report Tables, Charts and Statistics.

Under Report Tables, choose some random settings under Table Type, provide names for Title and Subtitle, and choose some of the available data fields with the Add button. Now click Run, and a swank new table is presented to you. Don't like the aesthetics? No problem. The Style output using... drop-down box lets you change the border to something more pleasing—a nice touch.

The pièce de résistance is probably the Charts section. This is where you can play around with the charts you see here in the screenshots, and more and more chart types are being added over time. Whether you want a bar graph, pie chart, line graph or something like a Scatterplot configuration, chances are it's doable. Play with some values in the Variables section, choose a Chart Type, click Run and a beautiful chart appears.

The Statistics section is where the elegance of design and data flow really come into play. This section is a bit beyond me, but here you can run statistical tests on your data, with a focus on the kind of tests most users need, most of the time. You can choose from common tests, such as ANOVA or Chi Square, or run through a check list of choices to choose what's right for you. Click Configure Test on the right, and you'll be presented with the final screen.

From here, you can choose which variables and groupings you want to test against. And finally, click Run. This section gives you the most impressive of the readouts and provides a comprehensive bundle of tables and graphs of analyzed statistics.

However, one of the most impressive and practical features under all three of these main sections is the Send output to... feature, with its View button. Here you actually can view each page of output in any Web browser in HTML format. This gives the project some instant credibility and practicality in that any work you do in SOFA can be opened instantly by anyone (like your coworkers) on their own computers, without needing to install SOFA Statistics. Plus, the information they see will be presented professionally with some impressive graphics to boot.

Although SOFA Statistics is still in its slightly buggy developmental stage, project maintainer Grant Paton-Simpson has shown an impressive grasp of what needs to be included in SOFA, from the small touches to the big. My hope is that this program becomes an adopted industry standard of sorts, mentioned in everyday conversation by organization workers the world over. And, given its free and multiplatform nature, combined with a very canny coder and designer, this hope of mine may not be an unrealistic one.

______________________

John Knight is the New Projects columnist for Linux Journal.

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions