Protecting Your Site with Access Controls

Portions of your web site can be kept secure using user name, password combinations.
Retrieving a Protected Document

If we try to retrieve a protected document, things get a bit more complicated. (We will see how to protect documents in just a moment; for now, assume that it is possible to restrict access to documents on a web server.) My main workstation, running Red Hat Linux 4.2 and Apache 1.2.4, contains a “private” directory whose contents are restricted. Let's retrieve the contents of /private/, just as I requested the contents of /lj/ before.

From the shell prompt, I connect to the web server with the following:

telnet localhost 80

Once I am connected, I request the “private” directory:

GET /private/ HTTP/1.0
Instead of receiving the contents of the /private directory or the index.html file contained within /private, I get the following response:
HTTP/1.1 401 Authorization Required
Date: Mon, 26 Jan 1998 12:08:17 GMT
Server: Apache/1.2.4
WWW-Authenticate: Basic realm="TestRealmName"
Connection: close
Content-Type: text/html
<TITLE>401 Authorization Required</TITLE>
<H1>Authorization Required</H1>
This server could not verify that you
are authorized to access the document you
requested. Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.<P>
Connection closed by foreign host.
In other words, my request was rejected because I had not authenticated myself. When did it give me a chance to do so?

Herein lies the dirty little secret of user authentication: when you retrieve a protected document, your browser really has to request the document twice. The first time that it tries to retrieve the document, a browser receives a message similar to the one that we received above, marked with the response code 401 indicating that you need authorization in order to retrieve this document.

Old or broken browsers stop at that point, presenting the server's error message to the user. Modern browsers that understand authentication present the user with a dialog box into which the user can type a user name and password. The browser then takes the user name and password, puts both into Base64 format and sends that along in an “Authorization” header after the initial request.

Modern browsers also save time by keeping track of user names and passwords that you have already entered. Thus, the first time you encounter a protected directory, you are prompted for your user name and password. The second time you retrieve a file from the same directory, you will not be prompted. Whether the browser waits to receive the 401 -- Authorization Required error before sending the user name, password pair or it automatically responds to the message depends on the implementation.

Thus, if my user name is “reuven” and my password is “password”, I can retrieve the contents of the /private/ directory by using TELNET to access port 80 on my local computer and entering:

GET /private/ HTTP/1.0
Authorization: Basic cmV1dmVuOnJldXZlbg==

The first line is identical to what we have seen before; it indicates that we want to use HTTP 1.0 to retrieve the document named /private/ (which happens to be a directory, although the client does not know that) using the GET method. Rather than pressing enter twice after the first line, we only press it once and then add a single additional header. This one begins with “Authorization:”, meaning that we are about to send authorization information to the system using the “Basic” algorithm, which is nothing more than a Base64 encoding of the user name and password that the user entered in the form username:password.

If the user name, password combination succeeds, the system returns the contents of the resource requested by the browser. If the request fails, the same message (with response code 401) is returned to the user's browser. The browser can allow the user to try again or can display the error message sent along with the 401 message.

In this case, the user name, password combination does indeed work, giving me the contents of /private/, which is the file /private/index.html, returned in the following manner:

HTTP/1.1 200 OK
Date: Mon, 26 Jan 1998 12:41:14 GMT
Server: Apache/1.2.4"
Last-Modified: Mon, 26 Jan 1998 10:49:49 GMT
ETag: "1057-ca-34cc6a4d"
Content-Length: 202
Accept-Ranges: bytes
Connection: close
Content-Type: text/html
<Title>My private site</Title>
<H1>My private site</H1>
<P>This is my private site.
From here, you can get to
<a href="test.html">my test page</a>.</P>

The 200 status code at the top of the response indicates that everything has gone well and that the server was able to retrieve the document that we requested. As you can see from the Content-Type header (or simply by looking at the document's contents), the requested document contains HTML-formatted text. Were we to view this through our browser, we would undoubtedly see the text in different sizes.


Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Upcoming Webinar
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th
Moderated by Linux Journal Contributor Mike Diehl

Sign up now

Sponsored by Skybot