OpenOffice.org ODF, Python and XML
That exercise proved the concept, so now we can get to work. My wife's poetry book was about 60 pages long, and it needed these issues addressed:
Those straight quotes, which came from plain-text e-mail messages or other word processors.
Apostrophes (or single quotes), which also were straight rather than curled the right way.
Double hyphens and shorter dashes (the en dash), which should all be changed into the longer em dash.
OpenOffice.org Writer has keystroke sequences for creating the en dash as well as the longer em dash. Sometimes the wrong sequence was typed, so an en dash appeared instead of the desired em dash. Plain text imported from e-mail messages sometimes had double hyphens (that is, --).
Concretely, we want to transform what's shown in Figure 5 into what's shown in Figure 6.
Let's develop the automated script in two pieces, and let's do it top-down. The top layer will create a temporary directory, unpack the original document and then run the bottom layer, a program (designated fixit.py) to modify content.xml. Afterward, it will pack up the files into the new document and clean up.
I want to use the highest-level language reasonable for each task; for this top layer, that's probably the shell. This script, called fixit.sh, turned out to be longer than I thought it would be, mostly because of all the error checking:
#!/bin/bash # Script to fix up OpenDocument Text (.odt) files # "cd" to the directory containing "fixit.py". # Make $TMPDIR, a new temporary directory TMPDIR=/tmp/ODFfixit.$(date +%y%m%d.%H%M%S).$$ if rm -rf $TMPDIR && mkdir $TMPDIR; then : # Be happy else echo >&2 "Can't (re)create $TMPDIR; aborting" exit 1 fi OLDFILE=$1 NEWFILE=$2 # Check number of parameters. # Ensure $NEWFILE's dir exists and is writable. # Quietly Unzip $OLDFILE. Whine and abort on error. if [[ $# -eq 2 ]] && touch $NEWFILE && rm -f $NEWFILE && unzip -q $OLDFILE -d $TMPDIR ; then : # All good; be happy. else # Trouble! Print usage message, clean up, abort. echo >&2 "Usage: $0 OLDFILE NEWFILE" echo >&2 " ... both OpenDocument Text (odt) files" echo >&2 "Note: 'OLDFILE' must already exist." rm -rf $TMPDIR exit 1 fi # Save file list in $F; is content.xml there? F=$(unzip -l $OLDFILE | sed -n '/:[0-9][0-9]/s|^.*:.. *||p') if echo "$F" | grep -q '^content\.xml$'; then : # Good news; we have content.xml else echo >&2 "content.xml not in $OLDFILE; aborting" echo >&2 TMPDIR is $TMPDIR exit 1 fi # Now invoke the Python program to fix content.xml mv $TMPDIR/content.xml $TMPDIR/OLDcontent.xml if ./fixit.py $TMPDIR/OLDcontent.xml > \ $TMPDIR/content.xml; then : # It worked. else echo >&2 "fixit.py failed in $TMPDIR; aborting" exit 1 fi if (cd $TMPDIR; zip -q - $F) | cat > $NEWFILE; then # Everything worked! Clean up $TMPDIR rm -rf $TMPDIR else # something Bad happened. echo >&2 "zip failed in $TMPDIR on $F" exit 1 fi
It's long but straightforward, so I explain only a few things here.
First, the temporary directory name includes the date and time (the date +% stuff), and the shell's process ID (the $$) prevents name collisions.
Second, the grep line looks the way it does because I want it to accept content.xml but not something like discontent.xml or content-xml.
Finally, we clean up the temporary directory ($TMPDIR) except in some error cases, where we leave it intact for debugging and tell the user where it is.
We can't run this script yet, because we don't yet have fixit.py actually modify content.xml. But, we can use a stub to validate what we have so far. The fixit.sh script assumes fixit.py will take one parameter (the original content.xml's pathname) and put the result onto stdout. This just happens to match the calling sequence for /bin/cat with one parameter; hence, if we use /bin/cat as our fixit.py, fixit.sh should give us a new document with the same content as the old. So, let's give it a whirl:
% ln -s /bin/cat fixit.py % ./fixit.sh ex1.odt foo.odt % ls -l ex1.odt foo.odt -rw-r--r-- 1 collin users 7839 2006-11-14 17:50 ex1.odt -rw-r--r-- 1 collin users 7900 2006-11-14 19:45 foo.odt % oowriter foo.odt
The new file, foo.odt, is slightly larger than ex1.odt, but when I looked at it with OpenOffice.org Writer, it had the right stuff.
As far as writing a program for manipulating content.xml—well, back in the 1990s, I probably would have spent many hours with yacc (or bison)—but today, Python with its XML libraries is a more natural choice.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- SourceClear Open
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Tech Tip: Really Simple HTTP Server with Python
- Non-Linux FOSS: Caffeine!
- Google's SwiftShader Released
- Doing for User Space What We Did for Kernel Space
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide