Creating Web Plots on Demand
The first section of Listing 1 requires little explanation. I just assign variables for file names and directories that will be used later. This is the section you might change if your web server's directories are different from mine.
The Perl script is meant to be started when a web browser requests it. In other words, it is a CGI program. A Common Gateway Interface (CGI) program provides a standard mechanism for providing “interactive content” over the Internet. Instead of requesting a static HTML document, the web browser asks the web server to run a program that will dynamically create an HTML document. CGI programs can be written in any language, but Perl is by far the most common language for CGI programming today.
My CGI script will actually run as a stand-alone program from the command line, although it isn't particularly useful when run that way. Running as a CGI program, it gains one important aspect: access to the QUERY_STRING environment variable. The QUERY_STRING environment variable is one way to pass information to a CGI program. If QUERY_STRING has been defined, then our program expects it to contain a date. This date should be in dd/mmm/yyyy format (e.g., 12/Sep/1997). That's the same format Apache uses to log its dates.
If QUERY_STRING isn't defined, we'll assume we should look for access data for today's date. In either case, the date we're seeking is stored in the variable $date.
In section 3 of Listing 1, we open the Apache access log and begin looking for lines that contain our target date. When we find a date that matches, we extract the hour and minute and place them in the variables $inhour and $inmin.
It's at this point that we run into a slight conceptual problem. We want to plot time of day on the x axis of our plot, but what we actually have is hours and minutes. What value will be plotted on the x axis? What we want is a time line from 12:01 AM to midnight. If we just turned times into integers, we would get values like 1024 for 10:24 AM. gnuplot could plot this, but since there are only 60 minutes in an hour, there would be a gap at the end of each plotted hour. There are many ways to solve this problem. The one I used was simply to scale the minutes 0 to 59 to a value between 0 and 99. The hours are then multiplied by 100 and the converted minutes are added to them. Thus, the time 1:30 PM becomes the integer value 1350. Our plot will only have tic marks for hours, so no one will notice our little conversion.
So in this section of code, we convert the hours and minutes to an integer value and use it as an array index for an array of counters called $accesses.
gnuplot can plot the data it reads from a text file. We need to write a file for gnuplot, where each line of the file contains two integer values. The first value will be our time value (the x axis) and the second value will be the number of web hits at that time (the y axis). The file will contain 2401 values. Here are a few lines from an example file:
0808 1 0809 0 0810 1 0811 2 0812 0 0813 21 0814 0 0815 12
This file shows a few minutes during the 8 o'clock hour. We create the file by printing out the data in our $accesses array. Here we encounter another little “gotcha”. Perl creates something called “sparse arrays”. This just means that array elements are created as they are defined. In other words, if there were no accesses at 8:12 AM, as in our sample data above, then no corresponding array element is created. It's undefined.
Normally in Perl, you can traverse an array using a foreach loop. In our case though, we need to output data for all time intervals, even those times when no accesses took place. We need this so we have a contiguous set of data for gnuplot to graph.
This is actually very easy to do. In section 4 of Listing 1, we set up a simple for loop. Using Perl's defined function, we determine if an array element exists. If it does, we simply print the index value as our x value and the access count as our y value. If the array element isn't defined, we know there were no accesses at that time, so we print out a zero as our y value.
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?