Protecting Your Site with Access Controls
One of the wonderful things about the Web is that so much information is freely available. For the cost of a telephone call and a monthly bill from your Internet service provider, you can read hundreds of newspapers, get updates on the computer industry and listen to radio stations from your home town.
Even the most open, freely available site usually contains one or more sections that are not meant for public consumption. The reasons for cordoning off sections of the site can vary: Perhaps the webmaster wants a place to put his favorite hacks, a repository for testing new programs or a directory in which staff notices can be placed. If a site wants to charge for content or restrict access to members of an organization, the problem becomes even more obvious.
One popular way to handle these problems is to create a directory that others are unlikely to guess. But this approach, known as “security through obscurity”, only works as long as no one leaks the name of the hidden directory. A far more robust approach will restrict access based on user name,password combinations.
This month, we will look at ways in which to restrict access to your server with the Web's standard user name, password authorization scheme. The principles should apply to any web server, but I will be using the freely available Apache web server (available at http://www.apache.org/) in my examples.
Access restrictions are part of HTTP, the protocol used in most web transactions. When your browser requests a document from a server using HTTP, it is usually returned immediately, preceded by several headers (i.e., name,value pairs) describing its length, the date on which it was last modified and the type of content it contains.
HTTP's designers recognized that webmasters might want to restrict access to one or more directories. Since version 1.0, HTTP has included provisions for restricting access to parts of a web site.
Let's see how this protection works from a computer's view, first by looking at an unprotected site and then by looking at a protected one. Once we understand how access protection works, we can incorporate it into our own work.
Everything starts when a user asks the browser to retrieve a document. No matter whether the user types the URL into a text field, selects it from a list of book marks or clicks on a hyperlink in an existing page of HTML, the effect is the same. The browser takes the URL, dissects it into a protocol, a server and a document, and takes the appropriate action. In the case of a URL such as:
the protocol name is http, the server name is www.ssc.com, and the document name (really a directory) is /lj/. Most Web servers are configured such that requesting a directory is the same as requesting the file index.html within that directory, so the above URL is effectively equivalent to this one:
http://www.ssc.com/lj/index.htmlWe can simulate the browser's actions by dissecting the URL on our own and by requesting the document /lj/ from www.ssc.com using HTTP from the Linux command line. The TELNET program is generally used to log into a remote machine, most often to open a shell on that machine. By giving telnet an argument in addition to the machine name, we can specify the port to which we wish to connect. Since web servers sit on port 80 by default, we can connect to the web server on www.ssc.com by typing:
telnet www.ssc.com 80When we establish a connection to that web server, we can enter an HTTP request. These requests start with a line describing the action we wish to take (known as a “method”), the name of the document we wish to retrieve and the version of HTTP we are using. Beginning with HTTP 1.0, this initial line can be followed by one or more header lines containing information about the user's browser, document types that the browser is willing to expect, HTTP cookies that may have been set in the past and other useful bits of information. For our purposes, it is enough to enter this line:
GET /lj/ HTTP/1.0and then press enter twice—once to end the line containing the request, and a second time to indicate that we have finished sending all of the headers and that we will now wait for a response from the server.
If all goes well, the server will respond by returning a page of HTML. In this particular case, we will receive HTML-formatted text (as we can tell from the text/html Content-Type header at the top of the response) with the latest information about this very magazine. Your browser is responsible for taking the HTML returned by the server and displaying it for you.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- Non-Linux FOSS: Snk
- diff -u: What's New in Kernel Development
- Building a Multisourced Infrastructure Using OpenVPN
- Server Hardening
- 22 Years of Linux Journal on One DVD - Now Available
- Giving Silos Their Due
- Controversy at the Linux Foundation
- Don't Burn Your Android Yet
- What's New in 3D Printing, Part III: the Software