Data Manipulation with Sprog

 in

I don't think I know anyone who relishes the task of data manipulation, and I'm certainly not different. Some of the more complicated manipulations pose a briefly satisfying technical challenge, but in the end, data manipulation is boring. Sometimes, I'm able to import a dataset into OpenOffice's spreadsheet, but usually, I have to write a Perl script or a Bash script to do what needs to be done. Usually, the programs aren't difficult, and I usually have small snippets of code laying around to take care of common tasks. Even so, it's just not... fun.

I happened to come across Sprog a few years ago; at the time I was looking for something completely unrelated, but thanks to the wonders of the Internet, I noticed the Sprog program and investigated it further.

Sprog allows you to solve data manipulation problems by dragging and connecting various gears together to build a “machine.” Sprog provides gears for reading files, fetching and parsing web pages, handling CSV files, running small Perl snippets, and finally displaying or writing the results.

Sprog does have a few software requirements:

GTK+ libraries
libgnomecanvas*
libglade
Perl - preferably version 5.8, along with the following CPAN modules:
Gtk2 Perl bindings
Gnome2::Canvas*
Gtk2::GladeXML
YAML
Pod::Simple

Once these are satisfied, installing Sprog is as:

perl Makefile.PL
make install

Finally, you start the program with the sprog command. Sprog presents you with a blank machine canvas.


Figure 1.

As you can see, there is an assortment of gears available and all you have to do is drag them onto the canvas, configure them, and connect them in order. Each gear has an input tab and an output tab, as appropriate. Each tab is keyed so that you can't connect gears together in ways that don't make sense. For example, you can't connect a “Retrieve URL” gear to an “Add Field Names” gear; the gears simply don't “fit” and the results wouldn't make sense. I've started building a simple machine in Figure 2.


Figure 2.


Figure 3.

By right-clicking and selecting Properties for each gear, I can tell the machine which file to open and which pattern to look for. Once all of the gears are connected, as in Figure 3, the machine will read the file, find all of the lines in the file that match the pattern I supplied, convert the line to upper case and display the results in a text window.

As you can see, the gears fit together like puzzle pieces. A machine starts out with an input gear such as a “Read File” gear, or a “Retreive URL” gear. From there, the data flows into the next gear in the machine. Each gear performs a particular function on it's input and passes the results to the next gear. Finally, the data gets to the machine's output gear. Sprog has output gears for displaying the results in a text window, writing to a data file, or piping the results to a command.

Let's consider another rather trivial example. In Figure 4, I've created a machine that takes the out put of the ls -la command and prints it out in tab-delimited format with just the filename and permissions field. Sure, this is a simple task, but it lets us discuss various features of the Sprog machine.


Figure 4.

I configured the “Run Command” gear to run the “ls -la” command. The first “Perl Code” gear simply had this snippet of Perl code in it:

s/\ +/,/g;

This code took the input, from the default ($_) variable, and changed each group of one or more spaces into a single comma, and outputted a CSV data stream.

The “CSV Split” gear accepted that data stream and split it out for use by the “Select Columns” gear. The “Select Columns” gear was configured to select columns 1 and 9 and send the results to the next gear.

The “List to CSV” gear converts the input for use by the “Perl Code” that simply converts the comma to a tab.

Finally, the results are displayed in a text window.

Now sure, I could have coded this up in Perl and been done with it in less than a minute. But, someone who didn't know Perl, or didn't want to learn it would be able to assemble these, and other gears, into a machine that accomplished a given data manipulation goal in a way that they could understand. In fact, it's not too unreasonable to expect to be able to store an entire library of gears and leave it to someone else to assemble them to accomplish a given task. For example, one might consider building a gear that collects the Apache log files from a given server. Then one might create a gear that uses a Perl routine to parse each log entry into CSV
format. From there, a person could assemble a machine to out put reports in almost any format they wanted.

The concept of having ready-made gears aimed at solving common problems implies that we can capture these gears and reuse them. Sprog allows us to save a given machine, of course, but it also allows us to save completely unconnected gears and essentially build a library of “puzzle pieces” that solve various problems. The end user simply assembles the parts needed to solve a given problem. To make this concept more clear, lets look at the content of a saved machine file.

- Sprog
- 1
- run_on_drop: 0
-
- CLASS: Sprog::Gear::CommandIn
ID: 6
NEXT: 14
X: 334
Y: 173
prop:
command: 'ls -la '
title: Run Command
- CLASS: Sprog::Gear::CSVSplit
ID: 11
NEXT: 12
X: 334
Y: 253
prop:
title: CSV Split
- CLASS: Sprog::Gear::TextWindow
ID: 7
NEXT: ~
X: 334
Y: 413
prop:
auto_scroll: ''
clear_on_run: 1
show_end_events: ''
show_start_events: ''
title: Text Window
- CLASS: Sprog::Gear::SelectColumns
ID: 12
NEXT: 10
X: 334
Y: 293
prop:
base: 1
columns: '1,9'
title: Select Columns
- CLASS: Sprog::Gear::PerlCode
ID: 14
NEXT: 11
X: 334
Y: 213
prop:
perl_code: 's/\ +/,/g;'
title: Perl Code
- CLASS: Sprog::Gear::PerlCode
ID: 16
NEXT: 7
X: 334
Y: 373
prop:
perl_code: 's/,/\t/g;'
title: Perl Code
- CLASS: Sprog::Gear::ListToCSV
ID: 10
NEXT: 16
X: 334
Y: 333
prop:
title: List to CSV

Fortunately, the file format is ASCII, and fairly intuitive. Essentially, it defines 7 gears, giving them unique ID numbers. For example, if you look at the gear known as ID 14, you see that it's a “Perl Code” gear and that it executes 's/\ +/,/g;' on it's input. The title of the gear is “Perl Code,” but I'm sure we can come up with something more imaginative, perhaps “Strip
out all spaces and convert them to commas.” Changing the name of a gear is as easy as changing it's title in the saved machine file. The next gear in the sequence is gear 11. So, continuing our thoughts from above, we could create a library of saved gears, then modify the save file so that each gear is well described. Finally, we could load the library in Sprog and assemble
machines to accomplish whatever data manipulation we need to do.

Once a library of gears has been created, we can distribute them to other people to assemble in order to solve recurring problems. It's kind of nice to create tools that other people can use to solve business problems without having to understand how they work. On the other hand, I don't want to rewrite the same snippets of code to solve common problems. As I see it, everybody wins!

I can easily imagine creating Sprog gears that access SQL databases or Apache log files. I can imagine incorporating Perl's filtering capabilities into a gear aimed at analyzing an Apache log file or an e-mail log file. I can even see creating a gear to output a spreadsheet in native Excel format.

After using Sprog for a bit, I've come up with a few hints that will make it easier for you to use. While Sprog does implement a snap-on function, it seems easier to grab a given gear by the “gear” icon and slightly overlap it with the previous gear; you will see it snap into place. Otherwise, it is often difficult to get gears to attach to each other. Also, it seems that Sprog inflicts a strict Top-down approach to solving problems. There is no
branching in Sprog. Sprog implements a series of gears, not a transmission. Finally, I've found that if I make a change to a given gear, I need to re-attach it to the gear before it. I guess this makes sense, but it lead to a lot of initial frustration until I realized what was happening.

I'm not sure I think that Sprog is easier than simply writing a Perl script to perform a particular data manipulation task, but it is certainly a lot more interesting and it's something that can be delegated to the actual consumers of the data and their results, thus empowering them to fulfill their own data manipulation requirements. Not everyone can write in Perl and Sprog is a nice way of empowering people to manipulate data in a transparent and repeatable fashion.

______________________

Mike Diehl is a freelance Computer Nerd specializing in Linux administration, programing, and VoIP. Mike lives in Albuquerque, NM. with his wife and 3 sons. He can be reached at mdiehl@diehlnet.com

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

That's pretty slick....

Kevin Ogden's picture

it's very much like Automater in OSX with a focus on data manipulation. And yes, automater can perform such tasks with the appropriate actions. Including a lot of higher level operations with actions installed by applications. It's very integrated with the desktop environment as well as most apps.

It'd be awesome to see that kind of integration for this. I usually end up writing a mass of shell scripts and using pipes on most of my *nix boxes for tasks such as this.

Helpful, but...

mats's picture

Mike,

thanks for the article about Sprog - this really looks like a helpful tool. Just for the convenience of casual readers like myself, would you mind to put up a link to the program's website as well?

Cheers, mats

mac os x

Anonymous's picture

Anyone got a step by step guide for installing under mac os x ? Fink doesn't fink the package and running perl make..... results in

perl Makefile.PL
Checking if your kit is complete...
Looks good
Warning: prerequisite Gnome2::Canvas not found.
Warning: prerequisite Gtk2 not found.
Warning: prerequisite Gtk2::GladeXML not found.
Warning: prerequisite YAML not found.
Writing Makefile for Sprog

Follow the yellow brick road...

Richie's picture

The screenshot was from Christian Renz; Google the phrase ["Christian Renz" sprog], and you get the Sprog forum listings, one of which was started by mister Renz, answering this very question: http://sourceforge.net/mailarchive/forum.php?thread_name=2c743b50cb13f33...

Neat! Thanks Christian!

Sprog on OS X

DarnitOL's picture

If you look closely at the SourceForge screenshots of Sprog in action, one talented lad managed to compile it on OS X.

Darned if I can figure out how he did it; instead of Fink, I use DarwinPorts (spare me the debate) and it has nothing for Gnome2 or YAML.

Reading the SourceForge e-mail list archives, our Mac-happy friend made some references to some changes in the programming - but that wouldn't solve the dependency issues as it was 1 or 2 lines - and the programmers committed his changes to the tree besides, so we should be able to safely ignore that conversation.

Unless he had it running in a VM, we are missing a large part of that conversation. And since this project hasn't seen much recent development, I will be surprised if anyone can help us.

great UI for an iphone

Anonymous's picture

great UI for an iphone interface for command line/scripting! its already OSX'd.. you're 90% there already..

Do'h!

Anonymous's picture

Ya, sorry about that. Here is the website:

http://sprog.sourceforge.net/

Getting sprog

dan's picture

For some strange reason, the article contains no info on where to get sprog. You can download it here:

http://sprog.sourceforge.net/

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState