Virtualization in Xen 3.0
Editor's Note: This article has been updated since its original posting.
Virtualization has existed for over 40 years. Back in the 1960s, IBM developed virtualization support on a mainframe. Since then, many virtualization projects have become available for UNIX/Linux and other operating systems, including VMware, FreeBSD Jail, coLinux, Microsoft's Virtual PC and Solaris's Containers and Zones.
The problem with these virtualization solutions is low performance. The Xen Project, however, offers impressive performance results--close to native--and this is one of its key advantages. Another impressive feature is live migration, which I discussed in a previous article. After much anticipation, Version 3.0 of Xen recently was released, and it is the focus of this article.
The main goal of Xen is achieving better utilization of computer resources and server consolidation by way of paravairtualization and virtual devices. Here, we discuss how Xen 3.0 implements these ideas. We also investigate the new VT-x/VT-i processors from Intel, which have built-in support for virtualization, and their integration into Xen.
The idea behind Xen is to run guest operating systems not in ring 0, but in a higher and less privileged ring. Running guest OSes in a ring higher than 0 is called "ring deprivileging". The default Xen installation on x86 runs guest OSes in ring 1, termed Current Privilege Level 1 (or CPL 1) of the processor. It runs a virtual machine monitor (VMM), the "hypervisor", in CPL 0. The applications run in ring 4 without any modification.

About 250 instructions are contained in the IA-32 instruction set, of which 17 are problematic in terms of running them in ring 1. These instructions can be problematic in two senses. First, running the instruction in ring 1 can cause a general protection exception (GPE), which also may be called a general protection fault (GPF). For example, running HLT immediately causes a GPF. Some instructions, such as CLI and STI, may can cause a GPF if a certain condition is met. That is, a GPF occurs if the CPL is greater than the IOPL of the current program or procedure and, as a result, has less privilege.
The second problem occurs with instructions that do not cause a GPF but still fail. Many Xen articles use the term "fail silently" to describe thess cases. For example, the POPF at the restored EFLAGS has a different interrupt flag (IF) value than the current EFLAGS.
How does Xen handles these problematic instructions? In some cases, such as the HLT instruction, the instruction in ring 1--where the guest OSes run--is replaced by a hypercall. For example, consider sparse/arch/xen/i386/kernel/process.c in the cpu_idle() method. Instead of calling the HLT instruction, as is done eventually in the Linux kernel, we call the xen_idle() method. It performs a hypercall instead, namely, the HYPERVISOR_sched_op(SCHEDOP_block, 0) hypercall.
A hypercall is Xen's analog to a Linux system call. A system call is an interrupt (0x80) called in order to move from user space (CPL3) to kernel space (CPL0). A hypercall also is an interrupt (0x82). It passes control from ring 1, where the guest domains run, to ring 0, where Xen runs. The implementation of a system call and a hypercall is quite similar. Both pass the number of the syscall/hypercall in the eax register. Passing other parameters is done in the same way. In addition, both the system call table and the hypercall table are defined in the same file, entry.S.
You can batch some hypercalls into one multicall by building an array of hypercalls. You can do this by using a multicall_entry_t struct. You then can use one hypercall, HYPERVISOR_multicall. This way, the number of entries to and exits from the hypervisor is reduced. Of course, reducing such interprivilege transitions when possible results in better performance. The netback virtual drivers, for example, uses this multicall mechanism.
Here's another example: the CLTS instruction clears the task switch (TS) flag in CR0. This instruction causes a GPF, however, when issued in ring 1, as is the case with HLT. But the CLTS instruction itself is not replaced by some hypercall. Instead, it is delegated to ring 0 in the following way. When it is issued in ring 1, we get a GPF. But this GPF is handled by do_general_protection(), located in xen/arch/x86/traps.c. Note, though, that do_general_protection() is the hypervisor handler, which runs in ring 0. From there, do_general_protection() calls do_fpu_taskswitch(). Under certain circumstances, this handler scans the opcode of the instructions received in the CPU. In the case of CLTS, where the opcode is 0x06, it calls do_fpu_taskswitch(0). Eventually, do_fpu_taskswitch(0) calls the CLTS instruction, but this time it is called from ring 0. Note: be sure _VCPUF_fpu_dirtied is set to enable this.
Those who are curious about further details can look at the emulate_privileged_op() method in that same file, xen/arch/x86/traps.c. The instructions that may "fail silently" usually are replaced by others.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
6 hours 28 min ago - Nice article, thanks for the
17 hours 9 min ago - I once had a better way I
22 hours 55 min ago - Not only you I too assumed
23 hours 12 min ago - another very interesting
1 day 1 hour ago - Reply to comment | Linux Journal
1 day 2 hours ago - Reply to comment | Linux Journal
1 day 9 hours ago - Reply to comment | Linux Journal
1 day 10 hours ago - Favorite (and easily brute-forced) pw's
1 day 12 hours ago - Have you tried Boxen? It's a
1 day 17 hours ago



Comments
Novell virtualization information page
Understand me now!
Novell offers various networking and virtualization solutions including 'SUSE linux enterprise' which has the added benefit of being able to support numerous operating systems such as Linux, Netware and Windows in unison (by sharing the same physical servers) due to Novell's collabiration with Microsoft. Users are therefore provided with the best virtualization platform for Windows server consolidation. Novells virtualization software also includes an integrated suite of tools for virtualization management and automation.
Here is a link to the Novell virtualization information page (http://www.novell.com/linux/virtualization/) using the link text virtualization or novell virtualization. I strongly believe that you readers will benefit from the networking and virtualization information and support offered by our website.
I think this Nick Page guy
I think this Nick Page guy is right, I was just thinking the same myself. I checked out Novell's site and its filled with quality info. I love open source!
As per the comment on
As per the comment on FreeBSD Jail, Solaris Zones have a very low overhead usually <1%.
typo
There is a typo in the first paragraph under "paravirtualization":
"The applications run in ring 4 without any modification."
I believe that should be "ring 3."
FreeBSD Jails have _no_
FreeBSD Jails have no performance impact! It's simply another technique with other uses.
Have you ever tried OpenVZ
Have you ever tried OpenVZ project?
It is much easier to use and allows to run more Virtual Servers than Xen.
Easier, maybe, but if performance matters
Perhaps it's easier for home usage or simple installs for your own infrastructure. If you simply need a hosted and installed OS on a good connection, you should look for a VPS. My finding was that OpenVZ servers I've rented were much slower that those from Xen providers. I recommend BudgetDedicated.com's Xen offerings
--
Johan
I was hoping to see more on alternative operating systems
Since the VT and Pacifica support was supposed to be the enabler for being able to load WinXP, etc. and run it inside Xen.
The Hypervisor really needs to be integrated into the Linux kernel code...it's too much of a pain to keep patching kernels as they're released...
I agree Xen can be hard to
I agree Xen can be hard to set up manually.
On the other hand, kernel and other needed binaries are often shipped with most major distros.
Thanks for useful article
I was wondering about Xen support on AMD, and this article was very useful. Keep up the good work.