Indexing with Glimpse
Since nearly my entire livelihood is maintained by exchanging electronic mail, my e-mail archives (not including many messages more than a year old) currently use nearly 100MB of my precious hard drive space—enough that I'm starting to consider buying a separate hard drive just for my personal files. In a desperate, somewhat successful, attempt to keep better track of my e-mail archives, I recently installed exmh, a graphical mail program based on MH (a powerful but complex mail reader) and Tcl/Tk.
One optional program that exmh can use to help manage e-mail is glimpse. From some or all of your mail files, this program builds an index, which exmh uses to quickly search for any word you want to look up. I can now search all my e-mail archives for a long-lost letter in less than a minute.
But this article isn't about exmh, as useful as it is for MH users like me. It's about glimpse, an excellent program in its own right.
Unlike the well-known grep program, glimpse does not usually take an argument telling which files to search. Instead, by default, glimpse looks in every file which it has indexed. This means that glimpse requires an index to work.
Perhaps the simplest way to use glimpse is to index all your files and search them all when you are looking for something. To do that, you need to create the index with:
which will index all your files, keeping the index in your home directory in files whose names start with .glimpse_. These files will usually take up about 2% to 3% of the total space of the of the files in your directory.
If you want to exclude certain files' names from the index, you can add their complete path names or “wildcard” expressions with * and ? characters to the file .glimpse_exclude. All of the .glimpse_* files are documented in the glimpseindex man page.
Since your files probably change from time to time, you will need to update the index occasionally. You can either do this manually, using the same command you used to create the index, or create a “cron job” to do it for you (but scheduling jobs with cron is beyond the scope of this article).
Now that you have created an index, you can search through it. The easiest way to do this is to simply type:
Glimpse searches through the default index (the one in your home directory) and returns output similar to grep's with the file name prepended to each matching line.
Perhaps your search doesn't turn up the file you are looking for; the word might be misspelled in the file. If you want to allow a one-letter spelling mistake, you can instead use:
glimpse -1 word
Perhaps your search turns up far too many matches. You can limit the matches to only files with names matching a certain pattern with the -F flag. To search only in files ending in .c, use:
glimpse -F '.c$' word
The argument following -F is a full regular expression, like the search patterns used by grep.
You don't have to index only files in your home directory. The -H option specifies a different directory tree to index. The index files are stored in the specified directory. If you want to index the /usr/doc directory provided with many Linux distributions, log in as root (or another user that can write in the /usr/doc directory) and run:
glimpseindex -H /usr/doc
and then any user able to read the /usr/doc/.glimpse_index file will be able to search those documents with:
glimpse -H /usr/doc word
If your searches aren't fast enough, you can trade disk space for time by running glimpseindex with the -o flag, to indicate an index that takes up 7% to 8% of the space of the files being indexed and increases search speed somewhat, or the -b flag to indicate an index that takes up 20% to 30% extra space and increases search speed more.
If you search all the time, you can speed up your searches by running the glimpseserver program in the background. That is covered in the glimpseserver man page.
Glimpse can do more than I can cover here, so if you don't see what you are looking for, try it—or at least read the documentation—before giving up. In particular, glimpse supports the options used by agrep (approximate grep), a popular search program written by the authors of glimpse several years ago. agrep and its man page are included in the glimpse distribution. Its options include boolean searches of different kinds.
Glimpse is also the search engine used in the Harvest system, which “is an integrated set of tools to gather, extract, organize, search, cache, and replicate relevant information across the Internet”, according to the Harvest Web site at harvest.cs.colorado.edu.
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- RSS Feeds
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- Readers' Choice Awards
- Tech Tip: Really Simple HTTP Server with Python
4 min 36 sec ago
- Reply to comment | Linux Journal
36 min 58 sec ago
- All the articles you talked
3 hours 35 sec ago
- All the articles you talked
3 hours 3 min ago
- All the articles you talked
3 hours 5 min ago
7 hours 29 min ago
- Keeping track of IP address
9 hours 20 min ago
- Roll your own dynamic dns
14 hours 34 min ago
- Please correct the URL for Salt Stack's web site
17 hours 45 min ago
- Android is Linux -- why no better inter-operation
20 hours 58 sec ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?