Caching the Web, Part 2
Last month we discussed the basic concepts of proxy servers and caching. Now, let's see how to implement this technology in your organization. A few proxy-server programs are on the market, such as MS-PROXY, aka Catapult, available only for Windows NT, and Netscape Proxy Server, available for different UNIX platforms and Windows NT. Both have two main drawbacks: they are commercial software and they don't support ICP. The excellent Apache web server has included a proxy-cache module since its 1.2 version. This module is a very interesting option: it's free, and works with the most popular web server on the Net. However, it doesn't use ICP, and its robustness is not comparable to the best choice for a proxy-cache server—Squid.
Squid is a high-performance proxy-cache server derived from the cache module of the Harvest Research Project, maintained by Duane Wessels. It supports FTP, gopher, WAIS and HTTP objects. It stores hot objects in RAM and maintains a robust database of objects in disk directories. Squid also supports the SSL protocol for proxying secure connections and has a complex access control mechanism. Another interesting feature of Squid is negative caching, which saves “connection refused” and “404 Not Found” replies for a short period of time (usually five minutes).
Squid consists of four programs:
squid: the main proxy server
dnsserver: a DNS lookup program that performs single, blocking DNS operations
unlinkd: a program to delete files in the background from the cache directory
It also provides a CGI program, designed to be run through a web interface, that outputs statistics about its configuration and performance and allows some management capabilities.
Installing Squid is easy. Just download the source archive from http://squid.nlanr.net/ and, in a temporal directory, type:
gzip -dc squid-x.y.z-src.tar.gz | tar xvf -
Next, compile and install the software by typing:
cd squid-x.y.z ./configure make all make install
These commands install all needed programs and configuration files to /usr/local/squid. The binary programs are installed in the /bin directory, the configuration files in /conf. Log files are located in the /logs directory, and the object database in the cache directory and its subdirectories. A shell script called RunCache is in the bin directory used to run the squid binary, and assures that if the process dies for any reason, it is restarted automatically. So, put the following line in your rc.local file:
This will generate an error log in /usr/local/squid/squid.out, if Squid could not start because of some configuration problem.
Of course you can choose to install an RPM version of Squid if you use RedHat Linux or another distribution that supports RPM packages.
Squid installs a sample configuration file called squid.conf with many comments for each option. Here you can change the ICP and HTTP ports (3128 by default) and define how much memory and disk space to reserve for caching objects and other parameters such as refresh patterns and access control restrictions. Of course, you need an ICP port only if your cache is going to be the sibling or parent of other caches. The directives for changing these values are http_port, icp_port, cache_dir and cache_swap. Additionally, you can set the maximum object size to be stored in the database; the default is 4MB. Also, you should uncomment the following lines in this file:
cache_effective_user nobody cache_effective_group nobody
This avoids running Squid as root, a dangerous habit for anyone who runs servers like httpd or gopherd. If you are using a recent version of Squid (at the time of this writing, the current version is 1.1.16), it will not start running as root, but will write an error message to the squid.out file.
To let Squid use 100 MB of your HD, the directive cache_dir should be something like this:
cache_dir /usr/local/squid/cache 100 16 256
Before starting Squid for the first time, create the cache and logs directories. To build the cache and hashed subdirectories, you should execute the commands:
cd /usr/local/squid mkdir cache chown -R nobody cache cd /usr/local/squid/bin ./squid -z
Finally, to create and change the owner of the logs directory:
cd /usr/local/squid mkdir logs chown nobody logs
Now Squid can be run safely for the first time, with the above RunCache invocation. It will spawn several dnsserver processes and write its PID in the file logs/squid.pid. Important warning or error messages can be found in the squid.out and logs/cache.log files. Remember, if you want to shut down the cache, you must first kill the RunCache process to avoid an immediate restart and then type:
/usr/local/squid/bin/squid -k shutdown
Never use kill -9 to shut down the cache, because it doesn't close the object database in such a way that it can be recovered—you'll probably lose part of it.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Non-Linux FOSS: Caffeine!
- Tech Tip: Really Simple HTTP Server with Python
- Managing Linux Using Puppet
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- SuperTuxKart 0.9.2 Released
- Doing for User Space What We Did for Kernel Space
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide