Converting e-Books to Open Formats
Listing 2. OPF is an XML-based format for book attributes.
<dc:Title>A Midsummer-Night's Dream</dc:Title>
<dc:Creator role="aut"
file-as="Shakespeare, William, 1564-1616">
William Shakespeare, 1564-1616
</dc:Creator>
<dc:Description>fiction, poetry</dc:Description>
The practical consequence of this is Convert Lit could be useful even if you wanted to leave all of your collection in a proprietary format. You still could run the program on all your .lit e-books and delete everything but the .opf files. Then, any quick script or full-blown XML parsing utility could scan them and index everything into the database of your choice.
Convert Lit also removes digital rights management (DRM) infections from e-book files using the older DRM1 version. And if you have Microsoft Reader e-books, you likely have a Microsoft Windows system and a licensed copy of Microsoft Reader. According to the Convert Lit Web site, you can build and run Convert Lit on Windows to first convert new DRM5 e-books to DRM1, using the Windows DRM library.
In general, we have discussed only command-line processing in this article. If, however, you have a whole collection of e-books in different formats, you can convert them all at one time with a simple shell script. As we already have shown, once the text is in ASCII or HTML format, the sky is the limit. You can add one or two lines to the loop to index with glimpse or ht::dig, print everything in one single PostScript book and much more.
A solution for putting e-books, at least the ones you will be able to get in the near future, into an open format is in the works. It is the Open eBook Publication Structure (OEBPS). Its goal is to provide an XML-based specification, based on existing open standards, for providing content to multiple e-book platforms. OEBPS, which has reached version 1.2, is maintained by the Open eBook Forum, a group of over 85 organizations—hardware and software companies, publishers, authors and users—involved in electronic publishing. OEBPS itself does not directly address DRM. However, an OeBF Rights and Rules Working Group is studying these issues “to provide the electronic publishing community with a consistent and mutually supporting set of specifications”. Time will tell what will come from this.
In any case, the open standards on which OEBPS is built already are well established. Besides XML, Unicode, XHTML and selected parts of the CSS1 and CSS2 specifications are represented. Unicode is a family of encodings that enables computers to handle without ambiguity tens of thousands of characters. XHTML is the reformulation of HTML 4 as XML. In a nutshell, OEBPS could be described as nothing more than an e-book optimized extension of XHTML—something that won't go away when some company goes out of business. Graphics can be in PNG or JPEG formats. Metadata, including author, title, ISBN and so on, will be managed through the Dublin Core vocabulary.
OEBPS has the potential to preserve all your e-books and make sure that the ones you download or buy will not vanish if any hardware or software company goes the way of the dodo. However, DRM schemes applied on top of these “open” e-books still could lock your content in to one vendor. As long as you can obtain OEBPS e-books without DRM, OEBPS is the best way to guarantee that even if all current e-book hardware disappeared, your collection would remain usable.
Resources for this article: /article/8208.
Marco Fioretti is a hardware systems engineer interested in free software both as an EDA platform and, as the current leader of the RULE Project, as an efficient desktop. Marco lives with his family in Rome, Italy.
- « first
- ‹ previous
- 1
- 2
- 3
Articles about Digital Rights and more at http://stop.zona-m.net CV, talks and bio at http://mfioretti.com
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Dynamic DNS
33 min 57 sec ago - Reply to comment | Linux Journal
1 hour 32 min ago - Reply to comment | Linux Journal
2 hours 22 min ago - Not free anymore
6 hours 24 min ago - Great
10 hours 11 min ago - Reply to comment | Linux Journal
10 hours 19 min ago - Understanding the Linux Kernel
12 hours 34 min ago - General
15 hours 4 min ago - Kernel Problem
1 day 1 hour ago - BASH script to log IPs on public web server
1 day 5 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
Pyrite Publisher
Evidently not all pdb files are created equally, and Pyrite has a certain aversion to Peanut files (PNRdPPrs). It cannot open these files.
error: couldn't find a way to convert from PDB:PPrs/PNRd
I have scoured the web looking for an alternative and have yet to find one.
There is another *open* format, and a reader
Regarding the author's article in general, and this comment in particular...
"An ebook supplier could charge enough to cover royalties, operating expense and a modest profit. In a model where the ebooks are all in a standard format, they could be formatted as needed, including drm controls, on demand to any format needed by the client for whatever ebook format they needed."
...(and please excuse me that for now, the app is Windows-based).
There is another "open standard" which has been around for a while, but ignored; it is called OpenReader Format.
Meanwhile, there was a free but proprietary app, using its propietary format, called ThoutReader. It was an awesome app (and format) for the Windows platform! It allowed you to do all this, and more:
There were many free and some decent commercial e-books. The best free ones were technical manuals, such as those for PHP and MySQL. Accounts are free, but credit cards are needed to buy commercial e-books and to upload public "notes".
Then the creators, OSoft, were hired by "Teachers Without Borders" to customize the app for them. At that time, they decided to a) go open-source, b) support the OpenReader Format, and c) rename the app "dotReader" after a pioneering woman in computer science.
dotReader has an area that can be customized via application customization, or by the e-book itself (to support a small advertisement area and bring down the costs of textbooks). dotReader has the old features and some additions. A conversion tool is being developed. It reads OpenReader Format, the old ThoutReader format (buggy, though, depending on the particular ebook), and a few other formats.
Now that it is open-source, someone will port it to Linux, I'm sure. Meanwhile, those who already sold their commercial e-books in ThoutReader format should be the first to offer them in OpenReader Format as well. dotReader allows for DRM support while maintaining an open file format.
Reportedly, textbook publishers really don't like the current cost situation either. Their high costs in the U.S. (as compared to U.K.) are apparently because of the way a traditional sales force is employed (even required) to get the textbooks used by academia and offered in college bookstores, and because of the high travel expenses in a much bigger country.
These publishers' stated goal, reportedly, is to keep textbooks $30 and under.
These developments should give the industry, and software makers, the kick in the pants they need. This is great for the industry and awesome for comsumers!
You can download dotReader from osoft.com
My reader
I am currently writing my own ebook reader and for that I am using a subset of HTML as source format. I am interested to share ideas on what should be in such a subset (the trick is to support all essential features without making it complex to handle). Feel free to mail me (jonas ALPHA nightmode DOT org) if you are intereseted in this subject.
To comment on the existing formats I must say that PDF is one of the worst because of the great difficulty of converting it back to something managable, even when it is not DRM-protected. And from what I've seend from OPF it looks too big and over-engineered to be handled smoothly.
A format for fiction say, does not need that much formatting - it should not be necessary to implement HTML4.0 just to parse running text...
MobiPocket
Are MobiPocket and Rocket Ebook files really the same thing? Rocket Ebook appears to be a non-DRM'ed format, whereas MobiPocket files are DRM-ed. At least the stores that I've seen selling MobiPocket files appear to need the key off the MobiPocket software you're using (extracted from the PDA) and the files appear to be keyed to this software.
Will the rbmake utilities really touch DRM-ed MobiPocket files?
When open formats are not reader-friendly
This is excellent advice for all e-book lovers dreaming of re-reading a book in maybe ten years' time!
Just a note on the side: formats must not only be free but also user-friendly. The Gutenberg project makes so many good books available but they take a lot of time reformatting if you actually want to read them. Here, simple raster images or bit-maps might be the solution. On www.books4free.org you can find copyright-free books in a reader-friendly format.
> Here, simple raster images
> Here, simple raster images or bit-maps might be the solution.
Blind people could not read these on their braille displays.
for CHM files...
archmage (archmage.sf.net) works great to convert to html and xchm (xchm.sf.net) is a great viewer
Non public book-formats
I really hate this properitary file formats. Have to install a new viewer all the time 8-(((
The only open format in my eyes is the PDF format. Dont know ... do all the compatible printer-drivers have to pay license fees to Adobe ???
Re: Non public ...
Open Office uses the PDF-ESImpress. So it looks to me, that there is no patent on this format.
hardware ebook readers
As the owner of an RCA ebook reader, I can say that the hardware is an excellent way to carry large volumes of reference material in a portable manner. Gemstar shot itself in the foot by pricing the ebook versions too high. I use rbmake to convert html ebooks for my reb1100 and it works quite well, however, many titles that I need are pdf format and are not readily converted to html for input to rbmake. I like the idea of a common open standard that can be readily converted to whatever proprietary standard is needed.
On the flip side, the publishing industry favors DRM and proprietary standards. The main problem is that the digital rights are controlled by the publisher and not by the original author/artist. The publishing industry should wake up and realize that the average ebook reading joe can figure out that the ebook publisher doesn't have the manufacturing and distribution costs associated with producing paperback copies of the same material. An ebook supplier could charge enough to cover royalties, operating expense and a modest profit. In a model where the ebooks are all in a standard format, they could be formatted as needed, including drm controls, on demand to any format needed by the client for whatever ebook format they needed.
Same to me: you REALLY miss plucker
with plucker I've got the web in my palm...
From the author, about: missing plucker
you REALLY miss plucker
No. Plucker is presented as " the best offline Web and e-book viewer for PalmOS"; that's why, even though I knew it, I didn't mention it.
The theme of the article is how to convert existing ebooks in PDA-only and/or proprietary formats to something open like plain text, HTML, OASIS or (in the future) OEBPS.
Not how to read or port content in open formats to a PDA.
Ciao,
Marco Fioretti
Regarding PalmOS Platform you
Regarding PalmOS Platform you miss plucker...