Searching PDF Files With grep
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Validate an E-Mail Address with PHP, the Right Way
- New Products
- Trying to Tame the Tablet
- Developer Poll
- enterprise
1 min 44 sec ago - not living upto the mobile revolution
2 hours 53 min ago - Deceptive Advertising and
3 hours 28 min ago - Let\'s declare that you have
3 hours 29 min ago - Alterations in Contest Due
3 hours 30 min ago - At a numbers mindset, your
3 hours 31 min ago - Do not get Just Almost any
3 hours 35 min ago - A fantastic rule-of-thumb to
3 hours 36 min ago - Keren mastah..
Penting,
4 hours 34 min ago - mini tablet compare
5 hours 53 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.



Comments
Why not use strings for the same effect with more portability
for a in *.pdf; do
strings $a | grep Copyleft
done
Because It Doesn't Work
I'm not sure if it never works, but it doesn't always work. PDF files contain compressed data and the characters of a word may not always be right next to each other in the data. If I use strings I can't find things that I could find with pdftotext:
Mitch Frazier is an Associate Editor for Linux Journal.
CentOS (Redat) uses "poppler-utils" ...
........ CentOS (Redat) uses "poppler-utils" ............
Using YUM, this also installs "poppler" and "poppler-utils"
.
.
THANKS FOR THIS NICE TIP!
great tip thanks
that was very useful, on Fedora I found xpdf , no poppler no xpdf-tools. Oh yes, it has pdftotext in it.. thanks! -t
SuSE PDF Package
When I search rpmfind.net, poppler-tools only provides rendering libraries (which requires poppler, based on the xpdf-3.0 code base). The utilities themselves are not provided in any poppler-tools rpm I can find. On my OpenSuSE 11 system, pdftotext is provided by xpdf-tools.
From http://poppler.freedesktop.org:
"Poppler is a PDF rendering library based on the xpdf-3.0 code base."
Check OpenSuSE's site
You can find the RPM's on opensuse.org:
http://download.opensuse.org/repositories/openSUSE:/11.0/standard/i586/poppler-tools-0.8.2-1.1.i586.rpm
http://download.opensuse.org/repositories/openSUSE:/11.0/standard/x86_64/poppler-tools-0.8.2-1.1.x86_64.rpm
I also use OpenSuSE 11.0:
Mitch Frazier is an Associate Editor for Linux Journal.
The poppler-tools RPM
The poppler-tools RPM doesn't appear to be the standard (yet -- see below), therefore it may not be available on all distros (which is probably why it's not on rpmfind.net). I have SLED11, based on OpenSuSE 11, but there is no poppler-tools RPM. I was not able to check Fedora, RedHat or Debian-based distros (i.e. Ubuntu, Knoppix, etc).
However, like I said, all of the contents of poppler-tools are included in xpdf-tools (with the exception of pdftohtml and pdftoabw). This is available from rpmfind.net for a few distros and appears to be more readily available for the more common distros.
The "rpm --what-provides..." in your tip would still apply regardless of what the source RPM is; however, I mention xpdf-tools to help users who may be confused by their own results of what is returned by --what-provides, which may not necessarily be poppler-tools. The functionality of the pdfto[whatever] tools appears to be similar, though.
Result of rpm -qp poppler-tools-0.8.2-1.1.i586.rpm --info:
Poppler is a fork of the xpdf PDF viewer developed by Derek Noonburg of
Glyph and Cog, LLC. The purpose of forking xpdf is twofold. First, to
provide PDF rendering functionality as a shared library to centralize
the maintenence effort. Today a number of applications incorporate the
xpdf code base and whenever a security issue is discovered, all these
applications exchange patches and put out new releases. In turn, all
distributions must package and release new versions of these xpdf based
viewers. It is safe to say that there is a lot of duplicated effort
with the current situation. Even if poppler in the short term
introduces yet another xpdf-derived code base to the world, it is hoped
that over time these applications will adopt poppler. After all, we
only need one application to use poppler to break even.
Second, we would like to move libpoppler forward in a number of areas
that do not fit within the goals of xpdf. By design, xpdf depends on
very few libraries and runs on a wide range of X-based platforms. This
is a strong feature and reasonable design goal. However, poppler
intends to replace parts of xpdf that are now available as standard
components of modern Unix desktop environments. One such example is
fontconfig, which solves the problem of matching and locating fonts on
the system in a standardized and well understood way. Another example
is cairo, which provides high quality 2D rendering. See the file TODO
for a list of planned changes.
So, as with most things in the whacky world of open-source, it would appear poppler-tools is trying to move away from xpdf-based code and rely solely on libpoppler. This basically gives users a choice of which PDF library implementation to use.
a couple of quick scripts
Thanks for the tip, that's a great and elegant solution.
One step more, I threw together these two scripts, one for grepping a single PDF, the other for grepping a directory of PDFs.
http://iquaid.org/programs/grep-pdf
http://iquaid.org/programs/grep-pdf-multi
Typical bash script, more copyright than code. Let me know if you make any improvements, I'd like to use them, too. :)