Here in the XenServer R&D group I’m working on a new open-source, xen-based, virtualisation architecture (code-named “Windsor”). The goal is to exploit the awesome capabilities of the xen hypervisor to make a next-generation platform for IaaS cloud computing.

Why do I love xen?

Xen was originally created in the University of Cambridge Computer Lab’s xenoserver project, whose goal was:

to build a public infrastructure for wide-area distributed computing.

This was back in the days before the terms “cloud” and “IaaS” became popular. The goal of xen was to be:

a high-performance hypervisor… [which] forms the core of each Xenoserver node, providing the resource management, accounting and auditing that we require.

Of course, xen has been wildly successful doing exactly that in the public cloud.

From an architectural perspective, xen is great because it does one job and it does it well. Xen is a “type 1″ or “bare metal” hypervisor which sits underneath all running VMs, isolating them from each other and controlling who gets to talk to the physical hardware. Xen isn’t an OS kernel; it doesn’t have a multitude of interfaces, VFS layers, block caches etc. It focuses on being a great hypervisor.

Windsor: Exploiting the capabilities of xen

The Windsor architecture will exploit the capabilities of xen to:

  • increase host security: services on the host will be split into separate VMs and deprivileged to decrease the size of the Trusted Computing Base (TCB). Xen will continue to provide rock-solid isolation between the VMs, limiting the effects of any compromise. In the xen community this technique is known as “domain 0 disaggregation“.
  • increase host scalability: just as modern applications can be designed to automatically scale-out across a public cloud, in Windsor host services will be able to scale-out across multiple VMs within the Windsor platform. For example if the host storage service overloads, the host would start a second service and a transparent load-balancer.
  • increase availability: since even the most robust systems still occasionally encounter bugs and fail, services in Windsor will be designed to recover quickly. For example, device drivers will all be run as separate VMs (known as “driver domains“). Each VM will have access only to one piece of hardware. If a device driver crashes, xen will contain the crash and the VM will be rebooted. Crucially, all the guest VMs will stay running and only notice a small interruption to their network or storage I/O.
  • better Quality of Service (QoS): the key to providing good QoS in the cloud is to prevent one tenant’s VMs from interfering with another tenant’s VMs, for example by exhausting the number of available I/O buffers in a shared storage stack. In Windsor, xen will isolate the network and storage stacks (“driver domains”) of each tenant so that the I/O requests are kept apart until they hit the physical hardware.

The best is yet to come…

Expect to see some technical detail (including links to code) in some future blog posts!