Searching PDF Files With grep
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- seo services in india
3 hours 24 min ago - For KDE install kio-mtp
3 hours 24 min ago - Evernote is much more...
5 hours 24 min ago - Reply to comment | Linux Journal
14 hours 10 min ago - Dynamic DNS
14 hours 44 min ago - Reply to comment | Linux Journal
15 hours 42 min ago - Reply to comment | Linux Journal
16 hours 33 min ago - Not free anymore
20 hours 35 min ago - Great
1 day 22 min ago - Reply to comment | Linux Journal
1 day 30 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Why not use strings for the same effect with more portability
for a in *.pdf; do
strings $a | grep Copyleft
done
Because It Doesn't Work
I'm not sure if it never works, but it doesn't always work. PDF files contain compressed data and the characters of a word may not always be right next to each other in the data. If I use strings I can't find things that I could find with pdftotext:
Mitch Frazier is an Associate Editor for Linux Journal.
CentOS (Redat) uses "poppler-utils" ...
........ CentOS (Redat) uses "poppler-utils" ............
Using YUM, this also installs "poppler" and "poppler-utils"
.
.
THANKS FOR THIS NICE TIP!
great tip thanks
that was very useful, on Fedora I found xpdf , no poppler no xpdf-tools. Oh yes, it has pdftotext in it.. thanks! -t
SuSE PDF Package
When I search rpmfind.net, poppler-tools only provides rendering libraries (which requires poppler, based on the xpdf-3.0 code base). The utilities themselves are not provided in any poppler-tools rpm I can find. On my OpenSuSE 11 system, pdftotext is provided by xpdf-tools.
From http://poppler.freedesktop.org:
"Poppler is a PDF rendering library based on the xpdf-3.0 code base."
Check OpenSuSE's site
You can find the RPM's on opensuse.org:
http://download.opensuse.org/repositories/openSUSE:/11.0/standard/i586/poppler-tools-0.8.2-1.1.i586.rpm
http://download.opensuse.org/repositories/openSUSE:/11.0/standard/x86_64/poppler-tools-0.8.2-1.1.x86_64.rpm
I also use OpenSuSE 11.0:
Mitch Frazier is an Associate Editor for Linux Journal.
The poppler-tools RPM
The poppler-tools RPM doesn't appear to be the standard (yet -- see below), therefore it may not be available on all distros (which is probably why it's not on rpmfind.net). I have SLED11, based on OpenSuSE 11, but there is no poppler-tools RPM. I was not able to check Fedora, RedHat or Debian-based distros (i.e. Ubuntu, Knoppix, etc).
However, like I said, all of the contents of poppler-tools are included in xpdf-tools (with the exception of pdftohtml and pdftoabw). This is available from rpmfind.net for a few distros and appears to be more readily available for the more common distros.
The "rpm --what-provides..." in your tip would still apply regardless of what the source RPM is; however, I mention xpdf-tools to help users who may be confused by their own results of what is returned by --what-provides, which may not necessarily be poppler-tools. The functionality of the pdfto[whatever] tools appears to be similar, though.
Result of rpm -qp poppler-tools-0.8.2-1.1.i586.rpm --info:
Poppler is a fork of the xpdf PDF viewer developed by Derek Noonburg of
Glyph and Cog, LLC. The purpose of forking xpdf is twofold. First, to
provide PDF rendering functionality as a shared library to centralize
the maintenence effort. Today a number of applications incorporate the
xpdf code base and whenever a security issue is discovered, all these
applications exchange patches and put out new releases. In turn, all
distributions must package and release new versions of these xpdf based
viewers. It is safe to say that there is a lot of duplicated effort
with the current situation. Even if poppler in the short term
introduces yet another xpdf-derived code base to the world, it is hoped
that over time these applications will adopt poppler. After all, we
only need one application to use poppler to break even.
Second, we would like to move libpoppler forward in a number of areas
that do not fit within the goals of xpdf. By design, xpdf depends on
very few libraries and runs on a wide range of X-based platforms. This
is a strong feature and reasonable design goal. However, poppler
intends to replace parts of xpdf that are now available as standard
components of modern Unix desktop environments. One such example is
fontconfig, which solves the problem of matching and locating fonts on
the system in a standardized and well understood way. Another example
is cairo, which provides high quality 2D rendering. See the file TODO
for a list of planned changes.
So, as with most things in the whacky world of open-source, it would appear poppler-tools is trying to move away from xpdf-based code and rely solely on libpoppler. This basically gives users a choice of which PDF library implementation to use.
a couple of quick scripts
Thanks for the tip, that's a great and elegant solution.
One step more, I threw together these two scripts, one for grepping a single PDF, the other for grepping a directory of PDFs.
http://iquaid.org/programs/grep-pdf
http://iquaid.org/programs/grep-pdf-multi
Typical bash script, more copyright than code. Let me know if you make any improvements, I'd like to use them, too. :)