Containers—Not Virtual Machines—Are the Future Cloud
The Problem with VMs
Here are the penalties we currently pay for VMs:
Running a whole separate operating system to get a resource and security isolation.
Slow startup time while waiting for the OS to boot.
The OS often consumes more memory and more disk than the actual application it hosts. The Rackspace Cloud recently discontinued 256MB instances because it didn't see them as practical. Yet, 256MB is a very practical size for an application if it doesn't need to share that memory with a full operating system.
As for slow startup time, many cloud infrastructure users keep spare virtual machines around to accelerate provisioning. Virtual machines have taken the window from requesting to receiving resources down to minutes, but having to keep spare resources around is a sign that it's still either too slow or not reliable enough.
So What's the Difference?
VMs are fairly standardized; a system image running on one expects mostly the same things as if it had its own bare-metal computer. Containers are not very standardized in the industry as a whole. They're very OS- and kernel-specific (BSD jails, Solaris Zones, Linux namespaces/cgroups). Even on the same kernel and OS, options may range from security sandboxes for individual processes to nearly complete systems.
VMs don't have to run the same kernel or OS as the host and they obtain access to resources from the host over virtualized devices—like network cards—and network protocols. However, VMs are fairly opaque to the host system; the hypervisor has poor insight into what's been written but freed in block storage and memory. They usually leverage hardware/CPU-based facilities for isolating their access to memory and appear as a handful of hypervisor processes on the host system.
Containers have to run the same kernel as the host, but they can optionally run a completely different package tree or distribution. Containers are fairly transparent to the host system. The kernel manages memory and filesystem access the same way as if it were running on the host system. Containers obtain access to resources from the host over normal userland/IPC facilities. Some implementations even support handing sockets from the host to the container of standard UNIX facilities.
VMs are also heavyweight. They require a full OS and system image, such as EC2's AMIs. The hypervisor runs a boot process for the VM, often even emulating BIOS. They usually appear as a handful of hypervisor processes on the host system. Containers, on the other hand, are lightweight. There may not be any persistent files or state, and the host system usually starts the containerized application directly or runs a container-friendly init dæmon, like systemd. They appear as normal processes on the host system.
Figure 3. Virtualization decoupled provisioning from hardware deployment. Containers decouple provisioning from OS deployment and boot-up.
Containers Now Offer the Same Features as VMS, but with Minimal Overhead
Compared to a virtual machine, the overhead of a container is disruptively low. They start so fast that many configurations can launch on-demand as requests come in, resulting in zero idle memory and CPU overhead. A container running systemd or Upstart to manage its services has less than 5MB of system memory overhead and nearly zero CPU consumption. With copy-on-write for disk, provisioning new containers can happen in seconds.
Containers are how we at Pantheon maintain a consistent system architecture that spans free accounts up to clustered, highly available enterprise users. We manage an internal supply of containers that we provision using an API.
We're not alone here; virtually every large PaaS system—including Heroku, OpenShift, dotCloud and Cloud Foundry—has a container foundation for running their platform tools. PaaS providers that do rely on full virtual machines have inflexibly high infrastructure costs that get passed onto their customers (which is the case for our biggest competitors in our market).
The easiest way to play with containers and experience the difference is by using a modern Linux OS—whether locally, on a server or even on a VM. There is a great tutorial on Fedora by Dan Walsh and another one with systemd's nspawn tool that includes the UNIX socket handoff.
Evolving Containers in Open Source
While Pantheon isn't in the business of providing raw containers as a service, we're working toward the open-source technical foundations. It's in the same spirit as why Facebook and Yahoo incubated Hadoop; it's part of the foundation we need to build the product we want. We directly contribute as co-maintainers to systemd, much of which will be available in the next Red Hat Enterprise Linux release. We also send patches upstream to enable better service "activation" support so containers can run only when needed.
We have a goal that new installations of Fedora and other major distributions, like Ubuntu, provide out-of-the-box, standard API and command-line tools for managing containers, binding them to ports or IP addresses and viewing the resource reservation and utilization levels. This capability should enable other companies eventual access to a large pool of container hosts and flavors, much as Xen opened the door to today's IaaS services.
One way we'll measure progress toward this goal is how "vanilla" our container host machines are. If we can prepare a container host system by just installing some packages and a certificate (or other PKI integration), we'll have achieved it. The free, open-source software (FOSS) world will be stronger for it, and Pantheon will also benefit by having less code to maintain and broader adoption of container-centric computing.
There's also an advantage to this sort of FOSS contribution versus Cloud Foundry, OpenShift or OpenStack, which are open-source PaaS and LaaS stacks. What Pantheon is doing with projects like systemd redefines how users manage applications and resources even on single machines. The number of potential users—and, since it's FOSS, contributors—is orders of magnitude larger than for tools that require multi-machine deployments to be useful. It's also more sustainable in FOSS to have projects where 99% of the value reaches 99% of a large, existing user base.
Efficiency demands a future of containers running on bare-metal hardware. Virtual machines have had their decade.
David Strauss is the CTO and co-founder of Pantheon, whose all-for-one-and-one-for-all improvements to the Drupal infrastructure have made the largest Drupal Web sites in the world more scalable and secure.
- Linux Kernel Testing and Debugging
- NSA: Linux Journal is an "extremist forum" and its readers get flagged for extra surveillance
- Tails above the Rest, Part III
- Wanted: Your Embedded Linux Projects
- The 101 Uses of OpenSSH: Part I
- RSS Feeds
- Are you an extremist?
- Pass on Passwords with scp
- Dolphins in the NSA Dragnet
- Linux Journal Archives