Speed Up Your Web Site with Varnish
The options given to the GET command aren't important here. The important thing is that the URL points to the port on which varnishd is listening. There are three response headers that were added by Varnish. They are X-Varnish, Via and Age. These headers are useful once you know what they are. The X-Varnish header will be followed by either one or two numbers. The single-number version means the response was not in Varnish's cache (miss), and the number shown is the ID Varnish assigned to the request. If two numbers are shown, it means Varnish found a response in its cache (hit). The first is the ID of the request, and the second is the ID of the request from which the cached response was populated. The Via header just shows that the request went through a proxy. The Age header tells you how long the response has been cached by Varnish, in seconds. The first response will have an Age of 0, and subsequent hits will have an incrementing Age value. If subsequent responses to the same page don't increment the Age header, that means Varnish is not caching the response.
Now let's look at the varnishstat command launched earlier. You should see something similar to Figure 2.
Figure 2. varnishstat Command
The important lines are cache_hit and cache_miss. cache_hits won't be shown if you haven't had any hits yet. As more requests come in, the counters are updates to reflect hits and misses.
Next, let's look at the varnishlog command launched earlier (Figure 3).
Figure 3. varnishlog Command
This shows you fairly verbose details of the requests and responses that have gone through Varnish. The documentation on the Varnish Web site explains the log output as follows:
The first column is an arbitrary number, it defines the request. Lines with the same number are part of the same HTTP transaction. The second column is the tag of the log message. All log entries are tagged with a tag indicating what sort of activity is being logged. Tags starting with Rx indicate Varnish is receiving data and Tx indicates sending data. The third column tell us whether this is data coming or going to the client (c) or to/from the back end (b). The forth column is the data being logged.
varnishlog has various filtering options to help you find what you're looking for. I recommend playing around and getting comfortable with varnishlog, because it will really help you debug Varnish. Read the varnishlog(1) man page for all the details. Next are some simple examples of how to filter with varnishlog.
To view communication between Varnish and the client (omitting the back end):
To view communication between Varnish and the back end (omitting the client):
To view the headers received by Varnish (both the client's request headers and the back end's response headers):
/usr/local/bin/varnishlog -i RxHeader
Same thing, but limited to just the client's request headers:
/usr/local/bin/varnishlog -c -i RxHeader
Same thing, but limited to just the back end's response headers:
/usr/local/bin/varnishlog -b -i RxHeader
To write all log messages to the /var/log/varnish.log file and dæmonize:
/usr/local/bin/varnishlog -Dw /var/log/varnish.log
To read and display all log messages from the /var/log/varnish.log file:
/usr/local/bin/varnishlog -r /var/log/varnish.log
The last two examples demonstrate storing your Varnish log to disk. Varnish keeps a circular log in memory in order to stay fast, but that means old log entries are lost unless saved to disk. The last two examples above demonstrate how to save all log messages to a file for later review.
If you wanted to stop Varnish, you could do so with this command:
kill `cat /var/run/varnish.pid`
This will send the TERM signal to the process whose PID is stored in the /var/run/varnish.pid file. Because this is the varnishd manager process, Varnish will shut down.
Now that you know how to start and stop Varnish, and examine cache hits and misses, the natural question to ask is what does Varnish cache, and for how long?
Varnish is conservative with what it will cache by default, but you can
change most of these defaults. It will consider only caching GET and HEAD
requests. It won't cache a request with either a Cookie or Authorization
header. It won't cache a response with either a Set-Cookie or Vary
header. One thing Varnish looks at is the Cache-Control header. This
header is optional, and it may be present in the Request or the Response. It
may contain a list of one or more semicolon-separated directives. This
header is meant to apply caching restrictions. However, Varnish won't
alter its caching behavior based on the Cache-Control header, with
the exception of the max-age directive. This directive looks like
Cache-Control: max-age=n, where n is a number. If Varnish receives
the max-age directive in the back end's response, it will use that value
to set the cached response's expiration (TTL), in seconds. Otherwise,
Varnish will set the cached response's TTL expiration to the value of
its default_ttl parameter, which defaults to 120 seconds.
Varnish has configuration parameters with sensible defaults. For example, the default_ttl parameter defaults to 120 seconds. Configuration parameters are fully explained in the varnishd(1) man page. You may want to change some of the default parameter values. One way to do that is to launch varnishd by using the -p option. This has the downside of having to stop and restart Varnish, which will flush the cache. A better way of changing parameters is by using what Varnish calls the management interface. The management interface is available only if varnishd was started with the -T option. It specifies on what port the management interface should listen. You can connect to the management interface with the varnishadm command. Once connected, you can query parameters and change their values without having to restart Varnish.
To learn more, read the man pages for varnishd, varnishadm and varnish-cli.
You'll likely want to change what Varnish caches and how long it's cached for—this is called your caching policy. You express your caching policy in the default.vcl file by writing VCL. VCL stands for Varnish Configuration Language, which is like a very simple scripting language specific to Varnish. VCL is fully explained in the vcl(7) man page, and I recommend reading it.
Before changing default.vcl, let's think about the process Varnish goes through to fulfill an HTTP request. I call this the request/response cycle, and it all starts when Varnish receives a request. Varnish will parse the HTTP request and store the details in an object known to Varnish simply as req. Now Varnish has a decision to make based entirely on the req object—should it check its cache for a match or just forward the request to the back end without caching the response? If it decides to bypass its cache, the only thing left to do is forward the request to the back end and then forward the response back to the client. However, if it decides to check its cache, things get more interesting. This is called a cache lookup, and the result will either be a hit or a miss. A hit means that Varnish has a response in its cache for the client. A miss means that Varnish doesn't have a cached response to send, so the only logical thing to do is send the request to the back end and then cache the response it gives before sending it back to the client.
Now that you have an idea of Varnish's request/response cycle, let's talk about how to implement your caching policy by changing the decisions Varnish makes in the process. Varnish has a set of subroutines that carry out the process described above. Each of these subroutines performs a different part of the process, and the return value from the subroutine is how you tell Varnish what to do next. In addition to setting the return values, you can inspect and make changes to various objects within the subroutines. These objects represent things like the request and the response. Each subroutine has a default behavior that can be seen in default.vcl. You can redefine these subroutines to get Varnish to behave how you want.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- diff -u: What's New in Kernel Development
- What's New in 3D Printing, Part III: the Software
- Server Hardening
- Giving Silos Their Due
- Controversy at the Linux Foundation
- 22 Years of Linux Journal on One DVD - Now Available
- Don't Burn Your Android Yet
- New Products
- February 2016 Issue of Linux Journal