The Searchable Site
So, now you have a searchable archive of your chosen sites, the coolest collection of links anywhere on your particular subject. Users everywhere can take advantage of your research and use your server to search through your highly optimized index. If you want, you now can serve ads in order to generate revenue and support your site. Back on the Manage Archive screen shown in Figure 2, check the box labeled Optional - include Sponsored SearchFeed links. Then, click on Set up/manage Account, which is a link to set up an account with Searchfeed.com. An on-line advertising and content provider company, Searchfeed.com provides sponsored search results that are supposed to be relevant to the keywords on which the user searches. Once your account is set up, simply enter the partner ID and track ID provided by Searchfeed.com and choose how many ads should appear at the top of your search results. It's pretty simple to set up. To get the most out of your ads, you can use the suite of on-line tools provided by Searchfeed.com to monitor what keywords users are searching on, which ads they are clicking on and how much you make from each click.
Whether or not you choose to add sponsored links to your search results, very likely you will want to wrap them in the “skin” of your site—your own look, feel and navigation menus. To accomplish this, you need to edit the file named wgoutput.cfg in the archive directory. (The location of the archive directory is shown on the Manage Archive screen.) This file contains the snippets of HTML code that go above, below and in between individual search results. You also can include your own header and footer files instead of typing in the HTML.
In some cases, you also may want to customize the ranking order of your search results. Webglimpse, unlike some search engines, doesn't claim to know what “percent relevant” a particular page is to the user. Instead, it lets you see under the hood how it calculates relative relevance of search results, and if you like, you can implement your own customized relevance ranking formula(s). Simply edit the file .wgrankhits.cfg in that same archive directory with a snippet of Perl code using these available variables:
# Available variables are: # # $N # of times the word appears # $LineNo Where in the file the word appears # $TITLE # of matches in the TITLE tag # $FILE # of matches in the file path # $Days Date (how many days old the file is) # $META Total # of matches in any META tag # $LinkPop Link popularity in the site (how # many times other pages link to it) # %MetaHash Hash with the # of times the word # appears in each META tag, indexed # by the NAME= parameter. # $LinkString actual url of link # The following uncommented lines # are the actual ranking formulae that will be used # This is the default ranking, it gives high weight # to keywords in the title, plus some weight to # regular hits, link popularity and freshness $TITLE * 10 + $N + $LinkPop + 5/($Days + 1)
By making use of the $LinkString variable, for instance, you can make sure that selected regions of your site always appear above others. In the Webglimpse home page, for example, we add this term to the default formula to make sure that pages in the /docs directory appear first in the search results:
+ ($LinkString =~ /\/docs\/)*1000
By now you may have an inkling of what the strengths and weaknesses are of Webglimpse: a bunch of neat features that are directly configurable by the user, and a bunch of neat features combined in a somewhat ad hoc manner. Webglimpse has, depending on your perspective, enjoyed or suffered from a great deal of tweaking to make it able to perform a lot of different tasks. The next version, which is in the works at the time of this writing, is intended to be simpler to install and maintain, and even to have an FTP-only install for users without shell access to their servers. Be that as it may, the most common problems you are likely to run into with the current version are as follows:
Permissions issues—these occur when you sometimes re-index from the Web administration interface, and sometimes from a shell or from your crontab. You can re-index any archive either by pressing the Build Index button in the Manage Archive screen or by running the script ./wgreindex from the archive directory. The best thing to do is decide on one way to re-index, stick to it and make the archive owned by the user who will run the re-index script.
URL/file translation issues—these occur mainly when the DocumentRoot is not correctly specified. You can check what file a given URL will be translated into or vice versa by pressing the Test Path Translations button on the main Web administration screen. All the applicable settings for local and remote domains are stored in this file: /usr/local/wg2/archives/wgsites.conf. You can edit wgsites.conf directly, or make changes by pressing the Edit Domain Configuration button in the Manage Archive screen.
More troubleshooting tips are available in the Documentation and How-tos page (see Resources).
Webinar: 8 Signs You’re Beyond Cron
11am CDT, April 29th
Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.Join us!
- Picking Out the Nouns
- Tips for Optimizing Linux Memory Usage
- "No Reboot" Kernel Patching - And Why You Should Care
- DevOps: Better Than the Sum of Its Parts
- Return of the Mac
- Android Candy: Intercoms
- Drupageddon: SQL Injection, Database Abstraction and Hundreds of Thousands of Web Sites
- Non-Linux FOSS: .NET?
- Consent That Goes Both Ways