A Recipe for Making Cookies
The overwhelming majority of URLs begin with the letters “http”, which stands for “hypertext transfer protocol”. Just as e-mail is transferred using SMTP (Simple Mail Transfer Protocol) and files are often retrieved using FTP (File Transfer Protocol), files written in HTML are generally transmitted using HTTP.
Why did the Web's inventors create a new protocol for transmitting hypertext, rather than sticking with previous ones? One answer is that they were interested in allowing servers to respond quickly and efficiently to requests from browsers. The client (browser) side of an HTTP transaction consists of a request for a document, containing several optional parameters, describing the document's content type and its last modification date. The server responds to the request by describing the document, including its content type, and returning the document. Once the document is sent, the server closes the connection. By exchanging a minimum of information and then breaking the connection, documents are transmitted with a low overhead, and thus, at a relatively fast clip.
This “statelessness”--the fact that each connection is used to transmit a single document and that each transaction takes place in a vacuum—was a terrific idea in the early days of the Web. It meant that browsers and servers had to keep track of very little information when transmitting documents, thus reducing the size and increasing the speed for these programs.
As a result, if we look at the access log from a typical web server, we see a list of document requests as well as the IP address (i.e., the number that uniquely identifies a computer on the Internet) of the computer from which the request originated. We do not, however, know whether three requests made from the same computer at roughly the same time were made by the same person or by three different people.
In many cases this would not be a problem; after all, if my web site is set up to serve out pages of HTML, then I probably don't care whether 1,000 different people have visited my site or if the same person has read 1,000 documents. For many sites statelessness does not present any obstacles.
However, many site owners, particularly commercial ones, are increasingly frustrated with the Web's inherent statelessness. It is much easier to sell advertising when you have a precise count of the number of people visiting your site, rather than a list of how many times each document was accessed. The number of “hits”, or individual HTTP requests received by a server, is a reasonable measure of a site's success only in the non-profit and personal sector; commercial sites are far more interested in how many pages were viewed by a given number of individuals.
Even small personal sites occasionally like to keep track of users. If you want to personalize a user's view of your site, a way to keep track of each user's preferences rather than a setting which applies to all users. And, while you could certainly get a user's name (and password, if necessary) via HTML forms, forcing the user to enter this on every page, or even upon arriving at your site's home page, would be a great burden on the user.
This month we will look at one of the most popular ways to keep track of user state, best known as HTTP cookies. Cookies allow servers to store small pieces of data on the user's computer, and thus to keep track of a user's movements on our site. Note that while cookies can be used to keep track of a user's movements, and potentially build a profile which might be of use to advertisers, they cannot collect any information which the user does not provide. Fears of privacy abuse might be true in some cases (and designers should recognize that cookies will offend and upset some users), but the fear that cookies can somehow collect information from your computer without your knowledge is off the mark. Cookies simply make it much easier to create interesting sites.
Cookies are small (up to 4KB) pieces of data stored on the user's computer by his browser. In addition to a name,value pair, cookies are tagged with expiration dates limiting the length of time they may be stored, as well as an indicator of the Internet host or domain that originally created the cookie.
The basic rule to remember when dealing with cookies is that the value of a cookie is set by the server using HTTP responses, and browsers return those values using HTTP requests. It's a bit disconcerting to think of things this way; we are not used to responses from servers containing a request of their own.
Let's say that we have a CGI program that returns a small bit of HTML when invoked. Assuming that the program is in the /cgi-bin directory and is called sample.pl, our browser would retrieve it by connecting to the server on port 80 and issuing a request like this one:
GET /cgi-bin/sample.pl HTTP/1.0
This request says that we are using HTTP 1.0 and would like the server to send us the document /cgi-bin/sample.pl. The server, because of its configuration options, knows that anything in /cgi-bin is a program, and so it executes sample.pl, returning the output. Here is an example of what sample.pl might return:
HTTP/1.0 200 OK Content-type: text/html <HTML> <Head><Title>Test</Title></Head> <Body><P>Test</P></Body> </HTML>The above is about as minimal as a modern HTTP transaction can get. A single header (Content-type) following the status code and preceding the message body is returned. Most of the time, more information is included in the response headers, such as the server name and version number and the date on which the document was created. If the server wants to set a cookie on the browser's computer, it must include an additional header, named Set-cookie. Just as the Content-type header defines the type of data that is being returned in the response, the Set-cookie header defines the name and value for a cookie that applies to the site from which the response originated.
For example, Listing 1 contains a short program (cookie-test.pl) that creates a cookie on the user's computer. If we run cookie-test.pl from a web browser, we see the HTML output produced by the program. If it were not for the program's polite indication that it had set a cookie, we would never know unless we asked our browser to warn us each time. (I tried this feature on discovering it in Netscape Navigator 3.0, but I quickly turned it off when I discovered how often such dialog boxes were interfering with my web browsing and how innocuous most of them appeared to be.)
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- diff -u: What's New in Kernel Development
- Server Hardening
- Giving Silos Their Due
- 22 Years of Linux Journal on One DVD - Now Available
- What's New in 3D Printing, Part III: the Software
- Controversy at the Linux Foundation
- Don't Burn Your Android Yet
- Firefox OS
- February 2016 Issue of Linux Journal