Web Analysis Using Analog
Analog can be configured to use customized log formats, which is a very good thing if you happen to have log files in various formats created by different servers. Even though I've used a number of different servers, I've been able to continue using Analog to analyze new and old log files (of different formats) by listing the type of log format before giving the name and path of the log file. I now use the Apache web server's combined log format, which produces a common log file that lists the referrer and browser information with the log entry for each access. Otherwise, I'd have separate log files, one for the referrer and another for the browser, and would need to include these log files when working with Analog's configuration files.
If you're a hostmaster, you can configure Apache to use a different log file for each virtual host. This keeps the information for each host separate and makes using Analog to analyze your virtual host log files much more straightforward. This is done using Apache's virtual host directive:
<VirtualHost vhost1.com> ServerAdmin firstname.lastname@example.org DocumentRoot /www/docs/vhhost1.com ServerName vhost1.com ErrorLog logs/vhost1.com-error_log CustomLog logs/vhost1.com-access_log combined </VirtualHost>
While you can use Analog with just the analog.cfg file to tell it what to do and where to save its report, if you want to create different reports for virtual hosts and individual pages, it's best to use multiple configuration files. Each configuration file serves a different purpose and can be combined with script files containing command-line switches for Analog.
In this scenario, Analog is run not once, but several times; each run creates a separate report. The analog.cfg file includes only a very few base commands that relate to our main site, not the virtual host sites. When creating reports for virtual hosts, I exclude analog.cfg from being called with the -G command-line switch.
The basic arrangement is similar to a pyramid format. All major items are in a master.cfg file to cover the broad category of all virtual hosts on our system. Items relating only to a specific virtual host and their general preferences are in the next tier, and finally, individual page.cfg files are in the last category. This allows me to create specialized setups as needed and still track individual hosts, sites and pages without making major changes.
When Analog is run for a virtual host, the master.cfg file is called first, followed by the master-vhost.cfg (I replace “vhost” with the name of the host when naming the file), and finally, single-page configuration files for separate pages. An example master.cfg file is included here (see Listing 1).
An example vhost.cfg file is shown in Listing 2, and as you can see, it's fairly general, since most of the report formatting and such is handled by the master.cfg file. The vhost.cfg file can be used to create a “total activity” report for the virtual host. The command-line prompt (or script file), shown without paths for clarity, is:
analog -G +gmaster.cfg +gvhost1.cfg +Ovhost1-total.html
The -G tells Analog not to use analog.cfg (which is used for the main host only). +g is used whenever we use additional configuration files: there's no space between it and the file name. +O designates the output file name: it's the letter O, not the number zero.
Single configuration files are used to give the basic information on the files(s) to include in the report (using the FILEINCLUDE command). The HOSTNAME and HOSTURL directives are the items that will appear at the top of each report after the words “Web Server Statistics for”. For individual pages, we use the name and URL of the page rather than the host name or URL. A single-page configuration file can be three or more lines, as shown in Listing 3.
Notice that the log file to use, output file and report-formatting commands aren't included; these items are set either in the master.cfg files or within the script file when Analog is run. This lets me use the same information when creating the daily and monthly reports, even though the two reports are very different.
The FILEINCLUDE command causes Analog to search through the logs and retrieve data relating to only the file you've specified. It's a very powerful command, and is normally used in the configuration files for individual pages or sites. It can also be used with a wild card; if I wanted to include all files in the widgets directory, I would use:
The command line used to create a daily report for this page (all on one line), shown without path information for clarity, is:
analog -G +gmaster.cfg +gmaster-vhost1.cfg +gwidgets.cfg +Owidget.html
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Devuan Beta Release
- May 2016 Issue of Linux Journal
- EnterpriseDB's EDB Postgres Advanced Server and EDB Postgres Enterprise Manager
- The US Government and Open-Source Software
- The Humble Hacker?
- BitTorrent Inc.'s Sync
- The Death of RoboVM
- Open-Source Project Secretly Funded by CIA
- New Container Image Standard Promises More Portable Apps
- AdaCore's SPARK Pro
In modern computer systems, privacy and security are mandatory. However, connections from the outside over public networks automatically imply risks. One easily available solution to avoid eavesdroppers’ attempts is SSH. But, its wide adoption during the past 21 years has made it a target for attackers, so hardening your system properly is a must.
Additionally, in highly regulated markets, you must comply with specific operational requirements, proving that you conform to standards and even that you have included new mandatory authentication methods, such as two-factor authentication. In this ebook, I discuss SSH and how to configure and manage it to guarantee that your network is safe, your data is secure and that you comply with relevant regulations.Get the Guide