OpenOffice.org Calc: The Mysteries of DataPilots Revealed

If you're coming fresh from Microsoft Excel, you might wonder where the Pivot tables are in OpenOffice.org Calc. The problem is, they're masquerading under the name of DataPilots. But, under any name, DataPilots are Calc's way of allowing you to quickly reorganize information in a range of cells so that you can gain a new insight into them. In a way, you could think of them as the spreadsheet equivalent of queries in databases. They aren't difficult to create, but they do take a little bit of thought to set up correctly.

A DataPilot requires a range of cells as a source. However, to work properly, the range must be organized as it would be in a database, with labels at the top of each column. For example, if you are a college instructor, you might have created a spreadsheet for entering grades.

raw-data

Without the labels, or adding them to rows will leave you unable to create a DataPilot.

To create a DataPilot, either highlight a range of shells in the current spreadsheet or locate a database that is registered in OpenOffice.org (go to File -> New -> Database ->Open an existing database to register one). Then select Data -> DataPilot -> Start and make your selection.

select-source

Once you select your data source, the DataPilot window opens. It shows all the available fields on the right, and blank template for a sheet on the right. The blank template includes 4 areas: the Page field, or title; the column or row field for headers, and the data fields for information. You drag any of the fields to a position on the blank template, or change the position of a field already placed on the blank template. If you want to discard a placed field, drag it off the blank template to somewhere else, and it reappears in the list of unused fields on the right.

Press the More button, and you get additional options. You can change the range of cells you are using, or, instead of displaying the DataPilot on a blank sheet of the current spreadsheet -- which is the default -- you can change the Results to field to undefined and then select an existing sheet.

You also have six other options under More, of which Ignore empty rows is probably the most useful. However, you may want to turn off the Total columns and Total rows if totals are irrelevant to your purpose. The same is true of the Add filter option, which allows you to display only those records that you select. As for the Identify categories option, which places information in a column without a label in the row above, you probably can do without it. The same is true of Enable drill to details if all you want to do is rearrange data and not perform any functions on it. For simple DataPilots, you can make life easy for yourself and accept the existing default options, but for more advanced ones, you might need to experiment with these options to get the results you want.

more

Setting up a DataPilot

The tricky part of using a DataPilot is arranging the fields in the template so that you don't produce complete gibberish.

The Page Fields is easy to figure out. Most likely it's just the title for the range of cells. Otherwise, it is unlikely to be related to any other fields.

The problem lies in the other fields. You might imagine that all you have to do is place some fields in the Column Fields square on the template and others in the Row Fields square. However, if you do that, the result is unreadable.

Instead, place one field in the Column Fields or Row Fields square. This field should be one that organizes the others. For instance, in a grade sheet, that might be the name. Then place other fields in the Data Square field.

In the simplest case, you can use a DataPilot to reposition data so that you compare different parts more easily. For instance, in a grade sheet for a class, you might be interested at the end of the semester in seeing whether the exams are any indicator of students' final grades. In this case, you would put the Name field in the Column Fields, and the Mid-Term, Final Exam and Term fields in Data Fields. You could do much the same by hiding columns, but creating a DataPilot would be far faster, at least for an experienced user.

simple-datapilot

However, if you notice, the fields added as Data Fields are listed now have "Sum -" as a prefix to their name. That's because, as you might guess, you are not limited to simply to displaying existing data. If you choose, you can perform some basic statistical functions on them as well, either by selecting a field from the Data Fields followed by the Options button, or clicking on the field in the template.

For instance, as you look at a grade sheet, you might want to know the average results on the course exams, so you can decided whether their difficulty is appropriate (Note that Calc refers to Average in the list of functions, but displays Mean).

datapilot-with-functions

In this case, you would not be particularly interested in the individual scores -- the average of a single figure being that figure -- but in the column totals (which, just to confuse some of you -- as well as me -- appear in their own separate columns, not at the bottom of each column). By contrast, if you selected row totals, all you would get would be a repetition of the figures for individual students.

Because of such complications, be prepared to take several tries when you construct a DataPilot, especially at first. If you do make a mistake, you can highlight the DataPilot and select Data -> DataPilot -> Delete before you try again -- or simply delete the sheet that the reject DataPilot appears on.

Be warned, too, that, while DataPilots remain linked to their source material, they do not automatically refresh if the source material changes. If you know or suspect that the source material has changed, select a DataPilot and then Data -> DataPilot -> Refresh to update.

Conclusion

More than one new user has despaired of getting DataPilots right. And, no question, they are idiosyncratic, not just in the correct placement of data, but also in such limitations as the inability to allow you to perform multiple functions on the same field without creating a new DataPilot.

One way to make your use of DataPilots slightly easier is to download DataPilot Tools from the OpenOffice.org extensions site. This extension adds the ability to show the source range (so that you can backtrack from the DataPilot), change the source range (in case you made an error), and refresh all the DataPilots in a spreadsheet (rather than refreshing them individually). All these tools can be tremendous time-savers if you frequently use DataPilots.

Even with this aid, you need to practice before you can use DataPilots. But, once you are used to them, you should find DataPilots convenient ways to perform statistical analyses without setting up each formula individually.

AttachmentSize
data-pilot.png25.2 KB
datapilot-with-functions.png48.52 KB
function.png31.78 KB
more.png11.78 KB
raw-data.png11.13 KB
______________________

-- Bruce Byfield (nanday)

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState