Building Impress and PowerPoint Slides with LaTeX and Perl
Let's begin with a story. Here's what happened: my second book, coauthored with Dr Michael Moorhouse, finally was finished. I had spent an extra six months on it, which meant it now was at least six months late. I had spent every spare minute typesetting, proofreading, writing, manually converting Michael's Microsoft Word files to LaTeX, reading and then re-reading. Then, I'd proofread it all again. When it was done and dusted, I was jaded. Soon after, I received the final proof of the cover. And there it was—printed right on the back cover—a promise to provide Microsoft PowerPoint slides on the Web site for use with the text. It was too late to change the cover, which meant I was committed to providing the slides one way or another. I had forgotten that we had decided to do this at the start of the project, more than 18 months prior.
Eighteen months ago, PowerPoint was the de facto standard slide production technology within the academic community. Today, PDF is popular too. As with many in the Linux community, I already had made the move to OpenOffice.org, leaving PowerPoint behind. With 20 chapters in the book, I estimated it would take at least 20 days' effort to produce the slides manually. The thought of doing this work with PowerPoint was not something I relished. I could work within OpenOffice.org Impress, of course, and then export to PowerPoint when finished, but this idea didn't sit well with me, either. The basic problem was I knew all the content already was in the LaTeX files and having to reproduce it using a slide production application left me feeling even more drained than I already was. If only I could find a way to extract the content programmatically from my LaTeX files and populate PowerPoint slides with it—that would improve things considerably.
Searching Google resulted in frustration. Perhaps not surprisingly, details of the PowerPoint file format were hard to come by. I did find a file in Microsoft Windows Help format that described the XML standard for Microsoft Office documents, to which PowerPoint documents can be exported. Unfortunately, it was a large, complicated piece of writing. Having decided I wasn't going to get anywhere on Google, I surfed over to Comprehensive Perl Archive Network (CPAN). Perl, my programming language of choice, has been hooked up to all types of file formats and other computing forms. If anyone had played with Perl and PowerPoint, details of the work would be available on CPAN. Unfortunately, this search also drew a blank.
Then it occurred to me: if I could work with the open and widely published OpenOffice.org Impress document format, I then could export my Impress slides to PowerPoint as a last step. A quick perusal of the OpenOffice.org Web site uncovered the official XML description of the OpenOffice.org file formats. Weighing in at more than 600 pages, the standard is bigger than my book!
The XML document is well written, but it's pretty heavy going. I surfed back to CPAN to see if any other programmers had taken the time to work with OpenOffice.org formats and were gracious enough to upload their work to CPAN. This time I wasn't disappointed. Jean-Marie Gouarne of Genicorp recently had released the OpenOffice::OODoc module, a Perl interface to the OpenOffice.org formats. Given an existing document, OpenOffice::OODoc can manipulate the content, adding to, deleting from and updating the disk file as need be.
I started with a simple filter, written in Perl, that takes a LaTeX file as input and produces the slide content as output in a customized textual form. By producing a text file, I ensured that any text editor could be used to edit the output from the filter, fine-tuning the textual content as necessary. Once happy with the textual content, another filter, also written in Perl, uses the textual content to create an Impress presentation. The Impress presentation then can be opened in Impress and exported to PowerPoint and/or PDF format.
I made a conscious effort to keep my presentations as simple as possible and decided to have only three slide types. The title_slide would contain the title of the chapter at the start of the presentation file. Within the presentation, the title_slide would do double duty as a placeholder for any graphic images associated with the chapter, with one title_slide created per graphic image. The bullet_slide would contain section titles as its slide heading and subsection titles as bullet items. Finally, the sourcecode_slide would provide a mono-spaced, verbatim slide used for program listings.
I used Impress to create a three-slide presentation manually, which I called blank.sxi. Each of the created slides corresponded to each of the three slide types described in the last paragraph. I planned to clone this presentation every time I programmatically created a presentation for each of my chapters. By cloning, I'd ensure that all of the presentations conformed to a standardized look and feel.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Linux Systems Administrator
- New Products
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Favorite (and easily brute-forced) pw's
1 hour 11 min ago - Have you tried Boxen? It's a
7 hours 3 min ago - seo services in india
11 hours 35 min ago - For KDE install kio-mtp
11 hours 35 min ago - Evernote is much more...
13 hours 36 min ago - Reply to comment | Linux Journal
22 hours 21 min ago - Dynamic DNS
22 hours 55 min ago - Reply to comment | Linux Journal
23 hours 54 min ago - Reply to comment | Linux Journal
1 day 44 min ago - Not free anymore
1 day 4 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
Great ideas, thanks!
Great ideas, thanks!
getcontent script
I am not my self a perl programmer. A way to obtain a workable getcontent script?
Best
Missing getcontent script
Sorry ... the script appears to be missing from the download. Here it is:
#! /usr/bin/perl -w
#
# The "getcontent" script: Given a LaTeX file on the command-line,
# extract it's textual content.
#
# By Paul Barry, paul.barry@itcarlow.ie
#
use strict;
use constant TRUE => 1;
use constant FALSE => 0;
my $in_verbatim = FALSE;
my $in_maxim = FALSE;
my $graphic_name = '';
while ( <> )
{
if ( $in_maxim )
{
if ( /\\end\{maxim\}/ )
{
print "STOPMAXIM\n";
$in_maxim = FALSE;
}
else
{
print;
}
next;
}
if ( $in_verbatim )
{
if ( /\\end\{verbatim\}/ || /\\end\{alltt\}/ )
{
print "STOPCODE\n";
$in_verbatim = FALSE;
}
else
{
print;
}
next;
}
if ( /\\chapter\{(.*)\}/ )
{
print "CHAPTERTITLE: $1\n"; next;
}
if ( /\\section\{(.*)\}/ )
{
print "BULLETTITLE: $1\n"; next;
}
if ( /\\subsection\{(.*)\}/ )
{
print "BULLETCONTENT: $1\n"; next;
}
if ( /\\begin\{verbatim\}/ || /\\begin\{alltt\}/ )
{
print "STARTCODE\n";
$in_verbatim = TRUE; next;
}
if ( /\\begin\{maxim\}/ )
{
print "STARTMAXIM\n";
$in_maxim = TRUE; next;
}
if ( /images\/(.*?)\}/ )
{
$graphic_name = $1; next;
}
if ( /\\caption\{\\label\{/ )
{
/label\{.*?\}(.*)\}\}/;
print "GRAPHICCAPTION: $1\n";
print "GRAPHICNAME: $graphic_name\n"; next;
}
if ( /^\\textit\{(.*)\}/ )
{
print "CHAPTERCONTENT: $1\n"; next;
}
}
Paul Barry
Some important updates to the OpenOffice::OODoc module
Jean-Marie Gouarné contacted me via e-mail with some updates on the status of his excellent Perl module. Here's what he said:
Thanks for this article. It's very useful for evangelization about the OOo XML format... And (that is much less important) thanks for your test with my OpenOffice::OODoc module!
However, I've just 2 remarks about your quotation of this Perl module:
1) OpenOffice::OODoc *can* create new OOo files (texts, spreadsheets, presentations and drawings) from scratch; this feature is available since version 1.201 (2004-07-30). To do so, the ooDocument() constructor must be called with a create => $class option (where $class is the document class, i.e. "text", "spreadsheet", etc).
2) The module has notably evolved in the meantime; now it supports both the OpenOffice.org 1.0 and the OpenDocument formats; in addition, there are a few draw- or impress-focused methods (so, for example, such methods as insertDrawPage or appendDrawPage are available in order to organize and copy presentation slides). But you were right when you said that "the module was created with a view to working primarily with OpenOffice.org Writer files". Text documents were and remain the main target.
I thought it worthwhile to post his message here. Thanks.
--
Paul Barry
IT Carlow, Ireland
http://glasnost.itcarlow.ie/~barryp
Paul Barry
Writing to Impress from Perl
Easy way to write to Impress/Powerpoint as Jean-Marie said:
#! /usr/bin/perl -w
use strict;
use OpenOffice::OODoc;
# start a new preso
my $preso = ooDocument(file => 'test.sxi', create => 'presentation');
my $slide = $preso->getDrawPage(0); # slide 0
$preso->createTextBox
(
attachment => $slide,
size => '10cm, 2cm',
position => '1cm, 2cm',
content => 'I want to write to Impress from Perl'
);
$preso->save;
Programmatic Conversions?
Thanks for the great article! Do you know if there is a way to programmatically convert the resulting impress document to PowerPoint? Perhaps as in $preso->export(...)?
Or would I need to use something like the Python-UNO bridge to do so?
image extraction from pdf's
Would "pdfimages" (part of the xpdf package) have sped up the final step (extraction of images from the pdf page proofs)?
using pdfimages?
> sped up the final step?
maybe ... if I had known about it! :-)
Thanks - I'll check out pdfimages.
Paul.
Paul Barry