Web Analysis Using Analog
Analog can be configured to use customized log formats, which is a very good thing if you happen to have log files in various formats created by different servers. Even though I've used a number of different servers, I've been able to continue using Analog to analyze new and old log files (of different formats) by listing the type of log format before giving the name and path of the log file. I now use the Apache web server's combined log format, which produces a common log file that lists the referrer and browser information with the log entry for each access. Otherwise, I'd have separate log files, one for the referrer and another for the browser, and would need to include these log files when working with Analog's configuration files.
If you're a hostmaster, you can configure Apache to use a different log file for each virtual host. This keeps the information for each host separate and makes using Analog to analyze your virtual host log files much more straightforward. This is done using Apache's virtual host directive:
<VirtualHost vhost1.com> ServerAdmin firstname.lastname@example.org DocumentRoot /www/docs/vhhost1.com ServerName vhost1.com ErrorLog logs/vhost1.com-error_log CustomLog logs/vhost1.com-access_log combined </VirtualHost>
While you can use Analog with just the analog.cfg file to tell it what to do and where to save its report, if you want to create different reports for virtual hosts and individual pages, it's best to use multiple configuration files. Each configuration file serves a different purpose and can be combined with script files containing command-line switches for Analog.
In this scenario, Analog is run not once, but several times; each run creates a separate report. The analog.cfg file includes only a very few base commands that relate to our main site, not the virtual host sites. When creating reports for virtual hosts, I exclude analog.cfg from being called with the -G command-line switch.
The basic arrangement is similar to a pyramid format. All major items are in a master.cfg file to cover the broad category of all virtual hosts on our system. Items relating only to a specific virtual host and their general preferences are in the next tier, and finally, individual page.cfg files are in the last category. This allows me to create specialized setups as needed and still track individual hosts, sites and pages without making major changes.
When Analog is run for a virtual host, the master.cfg file is called first, followed by the master-vhost.cfg (I replace “vhost” with the name of the host when naming the file), and finally, single-page configuration files for separate pages. An example master.cfg file is included here (see Listing 1).
An example vhost.cfg file is shown in Listing 2, and as you can see, it's fairly general, since most of the report formatting and such is handled by the master.cfg file. The vhost.cfg file can be used to create a “total activity” report for the virtual host. The command-line prompt (or script file), shown without paths for clarity, is:
analog -G +gmaster.cfg +gvhost1.cfg +Ovhost1-total.html
The -G tells Analog not to use analog.cfg (which is used for the main host only). +g is used whenever we use additional configuration files: there's no space between it and the file name. +O designates the output file name: it's the letter O, not the number zero.
Single configuration files are used to give the basic information on the files(s) to include in the report (using the FILEINCLUDE command). The HOSTNAME and HOSTURL directives are the items that will appear at the top of each report after the words “Web Server Statistics for”. For individual pages, we use the name and URL of the page rather than the host name or URL. A single-page configuration file can be three or more lines, as shown in Listing 3.
Notice that the log file to use, output file and report-formatting commands aren't included; these items are set either in the master.cfg files or within the script file when Analog is run. This lets me use the same information when creating the daily and monthly reports, even though the two reports are very different.
The FILEINCLUDE command causes Analog to search through the logs and retrieve data relating to only the file you've specified. It's a very powerful command, and is normally used in the configuration files for individual pages or sites. It can also be used with a wild card; if I wanted to include all files in the widgets directory, I would use:
The command line used to create a daily report for this page (all on one line), shown without path information for clarity, is:
analog -G +gmaster.cfg +gmaster-vhost1.cfg +gwidgets.cfg +Owidget.html
- The Tiny Internet Project, Part I
- Machine Learning with Python
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Free Today: September Issue of Linux Journal (Retail value: $5.99)
- Bitcoin on Amazon! Sort of...
- Securing the Programmer
- Epiq Solutions' Sidekiq M.2
- Android Browser Security--What You Haven't Been Told
- Nativ Disc
Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.Get the Guide