HTTP in 44k with libhttp

Webmaster

by Alan DuBoff

on January 1, 2002

Working with web servers is something that most of us will find ourselves doing much more in the future. HTTP is becoming more and more needed, even for non-web embedded devices, since it's often the best way to re-flash a device over the Net. Port 80, the standard HTTP port, is more often than not left open on firewalls. Because of this I've often found it safe to use HTTP to transfer data through firewalls. Port 80 is usually our friend.

But, inside a device, we often find we don't have the resources or the power in some cases to support the use of a large, slow HTTP library. Many of the libraries available have some nice bells and whistles in the way of features and function. The curl library, for instance, has support for secure HTTP transfers, and that definitely is needed in many situations. However, the curl library also has a lot of other features that are not needed by every embedded device.

A library that is a bit more lean, but has fewer features, is the GNOME HTTP library. However, the library seems to make copies of the data for transfer and storage. It requires that you initialize and maintain the request, and the documentation is very sparse. Even with these limitations, the library does work and is fairly easy to use.

I've run into a low-resource situation when re-flashing a device over the Net. The images needed are often 8MB or even 16MB these days, and the available RAM places constraints on the way we download and store that data before writing it to Flash. In one case, I really needed a small footprint, and I ended up writing some socket code to perform such a task. Sadly, the code was contained within a proprietary piece of code I wrote while working for another company.

With this experience in mind, I set out to create a small compact library that everyone could use, and I decided to do it by starting out with a program called httpget I found on the Net some time back. I converted it to a shared library, called libhttp.

Size: Really Matters in a Lot of Situations

By rolling our own HTTP transfer, we can bring down the requirements quite a bit, and this code will be useful to many people who need only the basic features. Let's look at the sizes of these shared libraries so we can see where some real value can be found in hard numbers. It might be good to note that this is the non-SSL version of the curl library. These have been compiled on the Intel x86 architecture. A RISC processor will often produce larger binaries than on the x86:

322521 Sep 21 19:03 libcurl.so.2.0.1
110479 Sep 21 19:36 libghttp.so.1.0.0
 45508 Oct 23 01:30 libhttp.so.1.0.0

Size is by far the biggest (or smallest as it may be) reason to use libhttp to begin with. Size will vary quite a bit between processors, as well as numbers changing whether one links statically or dynamically. Libhttp is small enough to fit in almost any embedded device that has such a need.

Caller Is Required to Free Memory

Libhttp should compile on almost any platform that supports gcc. It should be able to be called by any C or C++ application to perform the transfer and pass back an allocated pointer to the caller. The burden is on the caller to free the memory that has been allocated. This is dangerous in the event that the memory is not freed up, as this will create large memory leaks quickly, so let this be the first word of advice I can offer with this library. The caller is required to free memory.

For this article I will not cover all of the methods of HTTP, but will focus on the three common methods currently implemented in libhttp. Those methods are GET, POST and HEAD.

The GET method, the most common one, is limited to a request size of 8k, or 8,192 bytes. The POST method is not limited in size for the request but requires that you use the Content-Length header as a part of your request. This header informs the server how large the request is. In the case of the GET method, the server should end up truncating the request if the size goes over the 8k limit. Most times this will result in an HTTP error.

Another difference between the GET and POST methods is that a GET request should result in the same response from the server, even when called in succession. The POST may or may not produce the same results, and often a web server will take multiple requests and have some type of data handler sort things out for the response. The POST method is often used with forms.

For most requests, the GET method works well. If you do need to pass a large request, you'll be glad that the POST method is available. The HEAD method is like GET except that it gets only the header information, not the content. It can be used for checking the date on which some resource was last modified without actually getting it.

For the purpose of this article I am using these methods as a means to transfer data only. The data can be anything from a binary program, MP3 audio, MPEG video to a PNG image. As I've mentioned previously, Port 80 seems be our friend in this regard, and few places will block streaming HTTP, even if they do require a proxy.

I found some source code when I was looking at finding a solution that I could use in my embedded device. A program I stumbled across is called httpget, and it was written in 1994 by Sami Tikka. What's interesting is that this source code was written about the same time the browser was being introduced to the masses, not long after Linux was first created.

For the most part, this code should configure, build and install on almost every UNIX/Linux platform. That seems to be the case for all of the systems I have to test it on. I use a Debian woody system for development, currently with a 2.4.9 kernel. I am using gcc 2.95.4 natively, and gcc 2.95.2 to cross compile to a PowerPC 823 chip. I know there is a more recent cross compiler available, but this compiler has worked well and will most likely work until I can get around to upgrading.

To configure libhttp for running on Linux x86 natively, we can run the following commands:

./configure
make
make install

To cross compile to a specific target, configure for a host, such as a PowerPC as I do, one could configure and build with the following:

./configure --host=powerpc-linux
make clean
make

If you are cross compiling, do not install libhttp as the binaries probably will not run on your development host. Instead, copy the library to your target.

The original httpget, on which libhttp is based, just outputs the data to stdout. I modified it so that it will allocate a chunk of memory and then store the data to that memory. Currently, it allocates memory for the entire size of the data to transfer, dynamically reallocating as it reads the data. This is the simplest type of interface to call since it doesn't require that the caller allocate the memory beforehand.

The following defines are the size of the buffer to read data from the socket, and the length of the transfer buffer into which store those reads. I have placed these in header defines to make it easy to change the size on them. Depending on the type of data and requirements placed on the library, these values could need changing:

#define BUFLEN 8192
#define XFERLEN 65536

When a 64k chunk of memory is allocated, then libhttp will read from the socket in 8k chunks and dynamically reallocate additional 64k chunks as needed. This will provide eight reads for each additional reallocation of memory. I have done quite a bit of testing with this, but you can change the values to suit yourself. Most web pages will fit in the first 64k chunk allocated.

If you look at the source code for hget.c, you will see that it is very simple to call http_request(). You pass it the HTTP URL, and it will connect to the server, and the response is returned in HTTP_Response.pData. Along with the URL, you pass additional entities to be placed in the request header, and enum for the HTTP method type.

Proxy

The http_proxy environment setting can be used to set a proxy server. If set, this will specify a proxy server in the format of:

http://proxy-pita.my.net:8080/

You can set that in bash with:

$ export http_proxy=http://proxy-pita.my.net:8080/

http_proxy is a common environment variable used to designate the proxy server for many applications.

Syntax of http_request()

The syntax of http_request() is

HTTP_Response http_request( char *in_URL,
char *in_ReqAddition, HTTP_Method in_Method );

where char *in_URL is a valid HTTP URL. And char *in_ReqAddition is a pointer to additional HTTP headers if there are any to be sent. This could be one of the commonly used entities such as "If-Modified-Since'', "If-Match'' or "If-Range'' entities. The HTTP request is very specific as to how new-line characters are interpreted. There are two placed at the end of a request. For this reason, it is very important that we format the additional entities properly. You must not put the last new line on your additions, but you should put them in between multiple entities. I figure most people will use a single entity, such as "If-Modified-Since''.

HTTP_Method in_Method is method enum, as defined in the http.h header file.

HTTP_Response Structure

Listing 1. Making an HTTP GET Request

The HTTP_Response structure shown in Listing 1 is how http_request() returns the data to the caller. Notice in the declarations that I have character arrays rather than pointers. The reason I've chosen to do this is because I don't want the caller to worry about allocating any of the structure members and/or having to free it later. You do have to worry about pData since http_request will allocate it. Even in the case of a failure, it's possible that data is actually transferred for the error HTML. That will need to be freed just like any other request. Whether or not the request succeeds, the caller should always check to see if pData is nonzero, and free it if necessary:

#define HCODESIZE 4
#define HMSGSIZE 32

typedef struct
{
   char *pData;             //  pointer to data
   long lSize;              //  size of data allocated
   int  iError;             //  error upon failures
   char *pError;            //  text description
                            //  of error
   char szHCode[HCODESIZE]; //  http response code
   char szHMsg[HMSGSIZE];   //  message/description
                            //  of http code
} HTTP_Response, *PHTTP_Response;

A few notes regarding the above:

pData: pointer to data transferred. http_request will allocate/reallocate the memory dynamically as it transfers data; pData will be a pointer to that data. It is important to understand this as the caller is responsible to free the memory.
lSize: the size of pData. This value will be zero until a successful transfer is completed.
iError: this will be set to errno upon system errors.
pError: pointer to the text for iError as returned from strerror().
szHCode[HCODESIZE]: the HTTP response code (i.e., 200, 404, 303, etc.).
szHMsg[HMSGSIZE]: the HTTP message associated with the response code: OK, Not Found, Not Modified, etc.

Practical Examples

Three examples of using http_request(), hget, hhead and hpost are included in the source tarball. Both use a very similar syntax, call http_request(), and write the data to a file. You can use the source code of these programs as an example of how to use libhttp.

Although there is various other functionality that would be nice to implement, libhttp should provide a simple way to add support for HTTP data transfers to your software.

Resources

Alan DuBoff has been fiddlin' with computers for close to 20 years. He specializes in UNIX/Linux development. He was recently composing software for the Kerbango Internet Radio using Embedded Linux. His hobbies include sailing, music, ping-pong, slotcars and, of course, hacking on computer devices of all types. He can be reached by e-mail at maestro@SoftOrchestra.com.

email: maestro@SoftOrchestra.com

Load Disqus comments