Your CMS Is Not Your Web Site

A content management system is a centralized repository for your content. A Web site is a composite of decentralized fragments that are assembled on the edge, in just-in-time fashion as the content is being delivered to users. If it's not a Web site, what does a CMS do?

First and foremost, the job of a CMS is (not surprisingly) to manage your content. It keeps content in raw form, separate from the presentation layer in which it eventually should appear. A CMS also allows you to deliver content in multiple formats, such as JSON, RSS and Atom feeds. Many legacy and proprietary content management systems rely on creating static HTML output to use for a Web site, but most newer or open-source content management systems are developed in a way that they can be queried directly and return Web-friendly markup.

As is often the case with PHP content management systems, such as Drupal and WordPress, the CMS can output content via a Web server. This leads to the perception that the CMS is, in fact, the Web site. In the case of small Web sites, the difference is negligible and not worth discussing. However, for larger Web sites (and most certainly for enterprise sites), the difference becomes unavoidable. A CMS like Drupal is installed in a Web server's docroot, and content is requested just like any other Web page. A good CMS will handle things like granting users access to content, content administration, theming and even accepting user-submitted content, such as comments and blog posts. Apache accepts a GET request and passes that on to Drupal, which then interprets it using a variety of menu callbacks to determine which content is desired. It's smart enough to check user access controls first to determine whether the requested content is allowed, and then decides to deliver it. The theme engine within the CMS renders HTML around the content and returns an HTML document back to Apache for delivery to the end user.

So What Is a Web Site?

Broadly speaking, a Web site is a collection of HTML documents that interlink to other pages that let users easily load related documents. But in its more abstract form, and in the form most people expect from a Web site, it is the location on the Internet that users interact with using a Web browser. What the user actually interfaces with could be static HTML files, HTML generated via PHP, page updates using AJAX and JSON data, or even Flash or Java plugins fetching content from a Web service.

In the end, a Web site is that huge abstract thing users see and interact with when they put your URL in their browsers and something comes back. While clicking around your Web site, looking for the information they're after, end users have little to no concern about HTML or your CMS; they care only that the content is delivered in a seamless fashion.

Although the content itself makes up the majority of the Web site, there also are ancillary components that go along with serving that content. This necessarily includes the Web server, content distributed networks (CDNs), front-end cache systems and even Web browser support. All of these things may change what users see after the content has left the content management system.

It helps to think of your Web site as being composed of several categories, or buckets. Your content management system is, indeed, a bucket unto itself. You also should have buckets for caching (Varnish, CDNs and Memcache). Additionally, you may have another bucket for hosted integrations, such as Facebook and other social network plugins, comment systems and statistical analytics. There even may be buckets for mobile apps. Every time visitors view a page from your Web site, they are pulling from each one of these categories.

However, as far as the Web server is concerned, the "end user" in question may not actually be the person sitting at the Web browser. The requester may be a load balancer that manages requests to multiple Web servers that serve the same content. The load balancer then in turn returns its version of the document to a front-end cache like Varnish or a CDN. Varnish reads the content and decides that additional holes need to be filled and fetches more content.

Why Should My Web Site Be Decentralized?

Content management systems written in templating languages tend to be very fast to develop and are quick to update with new content, but they have performance concerns as a result of their tendency toward frequent dynamic page generation. For small amounts of traffic, internal caching is usually sufficient. Even serving static content, Web servers can scale only so far, as they have very defined hardware limitations--the amount of available RAM divided by the amount of memory allocated to a thread (like with PHP's memory_limit value) determines the maximum number of simultaneous users that can access the server. Adding multiple Web servers behind a load balancer, enabling front-end cache and using a CDN will improve the scalability of the Web site significantly.

Adding more buckets increases Web site capacity. Adding more buckets also changes the results of the request from the CMS into what is ultimately delivered as the Web site. This is where edge computing becomes an integral part of your Web site. To improve performance even more, it may become necessary to move some components, such as user registration and commenting systems, out of the content management system entirely, and place these things in some of your other buckets. For instance, a common approach is to have the CMS serve user-agnostic content with placeholders for the "Welcome so-and-so" messages. These placeholders can be edge-side includes (ESIs) that then are replaced by user-facing cache systems. Varnish and most CDNs support ESI by default. ESI fragments literally can live anywhere else on the Web, such that it's no longer a requirement for the CMS to manage user registration, much less even be aware of it.

Edge-side includes (for example, <esi:include src="http://example.com/fragment.html" onerror="continue"/>) are very similar to server-side includes (SSIs) when dealing with static HTML files served by Apache. ESI is a standard of edge-side computing and caching systems. This makes it possible to manage content in your primary CMS, manage users and user-generated content in a secondary system, and include other outlying content from tertiary systems. With this approach, your Web site actually may consist of several disparate content management systems assembled on the edge prior to delivery to end users.

Once this page is assembled and delivered, this still may not be the Web site with which users ultimately interact. Once the content is delivered, the client browser may make even more alterations. References to JavaScript on the page may invoke widgets and other hosted integrations, such as Facebook "like" buttons, social-media "share" buttons, feedback forms and user commenting systems. Using jQuery and other JavaScript frameworks are a common way to invoke additional content delivered from other content delivery services without your content management system having to control any of it.

Software as a Service (SaaS) is another bucket that sometimes makes up a Web site. Using on-demand functionality like SaaS, you easily can handle decentralized elements, such as user comments or breaking-news alerts. In this case, AJAX makes additional requests to your service and interprets the JSON response (because why must a CMS deliver just HTML?), and jQuery renders the results into the existing page. With SaaS, it is ultimately the browser that adds the finishing touches to the conglomeration that is your Web site.

http photo via Shutterstock.com.

Load Disqus comments