The Searchable Site
So, now you have a searchable archive of your chosen sites, the coolest collection of links anywhere on your particular subject. Users everywhere can take advantage of your research and use your server to search through your highly optimized index. If you want, you now can serve ads in order to generate revenue and support your site. Back on the Manage Archive screen shown in Figure 2, check the box labeled Optional - include Sponsored SearchFeed links. Then, click on Set up/manage Account, which is a link to set up an account with Searchfeed.com. An on-line advertising and content provider company, Searchfeed.com provides sponsored search results that are supposed to be relevant to the keywords on which the user searches. Once your account is set up, simply enter the partner ID and track ID provided by Searchfeed.com and choose how many ads should appear at the top of your search results. It's pretty simple to set up. To get the most out of your ads, you can use the suite of on-line tools provided by Searchfeed.com to monitor what keywords users are searching on, which ads they are clicking on and how much you make from each click.
Whether or not you choose to add sponsored links to your search results, very likely you will want to wrap them in the “skin” of your site—your own look, feel and navigation menus. To accomplish this, you need to edit the file named wgoutput.cfg in the archive directory. (The location of the archive directory is shown on the Manage Archive screen.) This file contains the snippets of HTML code that go above, below and in between individual search results. You also can include your own header and footer files instead of typing in the HTML.
In some cases, you also may want to customize the ranking order of your search results. Webglimpse, unlike some search engines, doesn't claim to know what “percent relevant” a particular page is to the user. Instead, it lets you see under the hood how it calculates relative relevance of search results, and if you like, you can implement your own customized relevance ranking formula(s). Simply edit the file .wgrankhits.cfg in that same archive directory with a snippet of Perl code using these available variables:
# Available variables are: # # $N # of times the word appears # $LineNo Where in the file the word appears # $TITLE # of matches in the TITLE tag # $FILE # of matches in the file path # $Days Date (how many days old the file is) # $META Total # of matches in any META tag # $LinkPop Link popularity in the site (how # many times other pages link to it) # %MetaHash Hash with the # of times the word # appears in each META tag, indexed # by the NAME= parameter. # $LinkString actual url of link # The following uncommented lines # are the actual ranking formulae that will be used # This is the default ranking, it gives high weight # to keywords in the title, plus some weight to # regular hits, link popularity and freshness $TITLE * 10 + $N + $LinkPop + 5/($Days + 1)
By making use of the $LinkString variable, for instance, you can make sure that selected regions of your site always appear above others. In the Webglimpse home page, for example, we add this term to the default formula to make sure that pages in the /docs directory appear first in the search results:
+ ($LinkString =~ /\/docs\/)*1000
By now you may have an inkling of what the strengths and weaknesses are of Webglimpse: a bunch of neat features that are directly configurable by the user, and a bunch of neat features combined in a somewhat ad hoc manner. Webglimpse has, depending on your perspective, enjoyed or suffered from a great deal of tweaking to make it able to perform a lot of different tasks. The next version, which is in the works at the time of this writing, is intended to be simpler to install and maintain, and even to have an FTP-only install for users without shell access to their servers. Be that as it may, the most common problems you are likely to run into with the current version are as follows:
Permissions issues—these occur when you sometimes re-index from the Web administration interface, and sometimes from a shell or from your crontab. You can re-index any archive either by pressing the Build Index button in the Manage Archive screen or by running the script ./wgreindex from the archive directory. The best thing to do is decide on one way to re-index, stick to it and make the archive owned by the user who will run the re-index script.
URL/file translation issues—these occur mainly when the DocumentRoot is not correctly specified. You can check what file a given URL will be translated into or vice versa by pressing the Test Path Translations button on the main Web administration screen. All the applicable settings for local and remote domains are stored in this file: /usr/local/wg2/archives/wgsites.conf. You can edit wgsites.conf directly, or make changes by pressing the Edit Domain Configuration button in the Manage Archive screen.
More troubleshooting tips are available in the Documentation and How-tos page (see Resources).
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
|The Firebird Project's Firebird Relational Database||Jul 29, 2016|
|Stunnel Security for Oracle||Jul 28, 2016|
|SUSE LLC's SUSE Manager||Jul 21, 2016|
|My +1 Sword of Productivity||Jul 20, 2016|
|Non-Linux FOSS: Caffeine!||Jul 19, 2016|
|Murat Yener and Onur Dundar's Expert Android Studio (Wrox)||Jul 18, 2016|
- Stunnel Security for Oracle
- Managing Linux Using Puppet
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- My +1 Sword of Productivity
- SUSE LLC's SUSE Manager
- Doing for User Space What We Did for Kernel Space
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- SuperTuxKart 0.9.2 Released
- SourceClear Open
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide