Integrating Web Applications with Apache

Content Integration

Once a remote application is integrated with an Apache server, from a protocol standpoint, it may be necessary to integrate content. This will generally manifest itself as URLs coded into HTML or JavaScript that are specific to a back-end server and not to a user-facing server. The basic necessity is to be able to search and replace bits of HTML or JavaScript content, so that it can render and perform correctly when accessed through an Apache proxy. The module that accomplishes this is mod_substitute and specifically the Substitute directive. Substitute allows a simple regex substitute to be performed on the payload data of an HTTP response.

Something to consider before attempting to replace text is to account for whether the back-end web server compresses data before sending it over the network. If it does, your Substitute statements might not work, as it will be searching for ASCII text within binary compressed data. To account for this, you can instruct Apache to decompress the data, manipulate the response and then re-compress it. This is done using the SetOutputFilter directive, which is part of Apache core functionality. Here's how it works:


SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE

Reading the arguments from left to right, this tells Apache to INFLATE (decompress) the data from the back-end server, perform the substitute and DEFLATE (compress) the data before returning it to the end user.

The Substitute statement uses a regex substitute expression. As I mentioned previously, I found it easier to use the pipe-style substitute expression in Apache. To recap, the syntax is s|search|replace|options. Two common options that I tend to use: "i", which denotes a case-insensitive search, and "n", to allow the search and replace values to be processed as regex. Here's a common use example:


Substitute "s|(href="http)(://)back-end01.test:8080|$1s$2public.test|in"

For this example, let's assume that the user-facing site (public.test) runs HTTPS, and the back-end server (back-end01.test) runs HTTP on port 8080. This would be a solution if the back-end web server returned hyperlinks that were specific to itself as opposed to the user-facing site. In the search portion of the regex substitute, this splits out two groups of text in parentheses: (href=\"http) and (://). These are blocks of text that you want preserved in the replace section of the regex. In the replace, you are inserting an "s" after http and replacing the hostname/port with the user-facing site name. After processing, the resulting string will be href="https://public.test. This will update hyperlinks that use "href" attributes (<a> and <link>). For <img> and <script> tags, you could use this same Substitute statement and replace "href" with "src". Another consideration would be to account for double or single quotes delimiting attribute values (href=' vs. href=").

Another application of Substitute is to extend the functionality of a page without manipulating the original source code. Consider the following example:


Substitute "s|(<body.*>)|\1<div style=\"font-size:14pt;
↪font-weight:bold;background-color:#ff0000;color:
↪#ffffff;display:block;text-align:center;\">This site
 ↪will be down for 24 hours beginning at 8 pm tonight</div>|in"

If a website needs to be taken off-line for maintenance, this is an easy way to alert the user population of the outage without modifying the application itself. This example simply inserts a red bar along the top of the page (right after the <body> tag), which displays information about the outage. Depending on how your page is rendered, you might need to choose another tag to act as your starting point instead of <body>.

Streamlining Future Integrations

All of the topics presented here can be configured and maintained relatively easily if you have only a few statements. In the real world, there typically will be many sites that use a similar configuration and having to define the functionality for each site can be time-consuming and can lead to mistakes. Luckily, Apache provides a mechanism to repeat functionality throughout your configuration through the use of mod_macro. The <Macro> directive within an Apache config functions very much like a function or subroutine. Once a macro is defined, it can be referenced as many times as is necessary, leaving you with one place within your config to maintain your detailed functionality. Here's an example macro:


<Macro RedirectSecure $host $path>
        RewriteCond "%{REQUEST_URI}" "^$path"
        RewriteRule "^/(.*)$" "https://$host/$1"
</Macro>

When called, this macro will define a RewriteCond and RewriteRule that, if they access a URL starting with the value of the $path argument, will redirect the user to http://$host/$1, where $host is the hostname specified as a macro argument and $1 is the entire URL path. The following syntax would be used to call this macro:


Use RedirectSecure public.test /users

Something to consider is the location within the Apache config from which a macro is called. A RewriteRule, for example, cannot be called outside a <VirtualHost> block. As such, if the macro is called outside a <VirtualHost> block, Apache will throw an error and not start. Here's another example:


<Macro ReplaceContentURL $backendurl $publicurl>
        Substitute "s|(href=\")$backendurl|$1$publicurl|in"
        Substitute "s|(src=\")$backendurl|$1$publicurl|in"
</Macro>

This macro expands on the replacing of URLs that I covered previously. This will search for tag attributes of "href" and "src" and replace the hyperlinks of the back-end server with that of the user-facing server. Here's an example of how this might be called:


Use ReplaceContentURL http://back-end01.test:8080 https://public.test

This will search for http://back-end01.test:8080, beginning with either href=" or src=" and replace the URL with https://public.test. Macros can be used for any piece of Apache configuration. They can be used to do small tasks as shown here as well as whole site configurations. Although macros are pretty simple, they make the difference between a large amount of difficult-to-maintain configuration files and a simplified reusable configuration.

At this point, you have some basic knowledge of integrating HTTP, customizing content and reproducing configuration within Apache. Although many directives and modules weren't covered here, this will be a great starting point and can help you get started with accessing your applications through Apache.

Resources

The following are some articles I've found useful along with some example Apache configs I've written.

Apache Module Reference (2.2): http://httpd.apache.org/docs/2.2/mod

Apache Module Reference (2.4): http://httpd.apache.org/docs/2.4/mod

Git Instaweb Reverse Proxy: https://gist.github.com/bng44270/cff67619db3e3f915957

Monit Reverse Proxy: https://gist.github.com/bng44270/287277ea1975b9a3e3526d5a5bcb017c

Adobe Experience Manager Apache Config: https://github.com/bng44270/aem-dispatcher-config

______________________

Andy Carlson has worked in IT for the past 13 years doing networking and server administration. He and his amazing wife have three daughters and a son, and they currently reside in Cincinnati, Ohio.