Searching PDF Files With grep

FAIL (the browser should render some flash content, not this).

Download in .ogv format


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Why not use strings for the same effect with more portability

ojblass's picture

for a in *.pdf; do
strings $a | grep Copyleft

Because It Doesn't Work

Mitch Frazier's picture

I'm not sure if it never works, but it doesn't always work. PDF files contain compressed data and the characters of a word may not always be right next to each other in the data. If I use strings I can't find things that I could find with pdftotext:

  $ strings datasheet.pdf |grep Charac
  #### nothing found ####

  $ pdftotext datasheet.pdf -| grep Charac
  ... Part 27pF Tech. Characteristics 50V-20% Ceramic Package CASE 0603

Mitch Frazier is an Associate Editor for Linux Journal.

CentOS (Redat) uses "poppler-utils" ...

Joe Mama's picture

........ CentOS (Redat) uses "poppler-utils" ............

Using YUM, this also installs "poppler" and "poppler-utils"

great tip thanks

turgut's picture

that was very useful, on Fedora I found xpdf , no poppler no xpdf-tools. Oh yes, it has pdftotext in it.. thanks! -t

SuSE PDF Package

Anonymous's picture

When I search, poppler-tools only provides rendering libraries (which requires poppler, based on the xpdf-3.0 code base). The utilities themselves are not provided in any poppler-tools rpm I can find. On my OpenSuSE 11 system, pdftotext is provided by xpdf-tools.

"Poppler is a PDF rendering library based on the xpdf-3.0 code base."

Check OpenSuSE's site

Mitch Frazier's picture

You can find the RPM's on

I also use OpenSuSE 11.0:

$ cat /etc/SuSE-release
openSUSE 11.0 (X86-64)
VERSION = 11.0

$ rpm -q -l poppler-tools

Mitch Frazier is an Associate Editor for Linux Journal.

The poppler-tools RPM

Anonymous's picture

The poppler-tools RPM doesn't appear to be the standard (yet -- see below), therefore it may not be available on all distros (which is probably why it's not on I have SLED11, based on OpenSuSE 11, but there is no poppler-tools RPM. I was not able to check Fedora, RedHat or Debian-based distros (i.e. Ubuntu, Knoppix, etc).

However, like I said, all of the contents of poppler-tools are included in xpdf-tools (with the exception of pdftohtml and pdftoabw). This is available from for a few distros and appears to be more readily available for the more common distros.

The "rpm --what-provides..." in your tip would still apply regardless of what the source RPM is; however, I mention xpdf-tools to help users who may be confused by their own results of what is returned by --what-provides, which may not necessarily be poppler-tools. The functionality of the pdfto[whatever] tools appears to be similar, though.

Result of rpm -qp poppler-tools-0.8.2-1.1.i586.rpm --info:

Poppler is a fork of the xpdf PDF viewer developed by Derek Noonburg of
Glyph and Cog, LLC. The purpose of forking xpdf is twofold. First, to
provide PDF rendering functionality as a shared library to centralize
the maintenence effort. Today a number of applications incorporate the
xpdf code base and whenever a security issue is discovered, all these
applications exchange patches and put out new releases. In turn, all
distributions must package and release new versions of these xpdf based
viewers. It is safe to say that there is a lot of duplicated effort
with the current situation. Even if poppler in the short term
introduces yet another xpdf-derived code base to the world, it is hoped
that over time these applications will adopt poppler. After all, we
only need one application to use poppler to break even.

Second, we would like to move libpoppler forward in a number of areas
that do not fit within the goals of xpdf. By design, xpdf depends on
very few libraries and runs on a wide range of X-based platforms. This
is a strong feature and reasonable design goal. However, poppler
intends to replace parts of xpdf that are now available as standard
components of modern Unix desktop environments. One such example is
fontconfig, which solves the problem of matching and locating fonts on
the system in a standardized and well understood way. Another example
is cairo, which provides high quality 2D rendering. See the file TODO
for a list of planned changes.

So, as with most things in the whacky world of open-source, it would appear poppler-tools is trying to move away from xpdf-based code and rely solely on libpoppler. This basically gives users a choice of which PDF library implementation to use.

a couple of quick scripts

quaid's picture

Thanks for the tip, that's a great and elegant solution.

One step more, I threw together these two scripts, one for grepping a single PDF, the other for grepping a directory of PDFs.

Typical bash script, more copyright than code. Let me know if you make any improvements, I'd like to use them, too. :)