Introduction

Cake celebrating the half-millionth XenRT job

XenRT, the Citrix XenServer automated test system, has achieved a significant milestone this month. The 500,000th test job has run. And this in the 8 years since XenRT began life testing the Xen hypervisor and XenSource’s XenEnterprise product back in the summer of 2005.

XenRT stands for Xen Regression Test, and was conceived of and built by a team headed by James Bulpin, then head of XenSource QA (later Citrix XenServer QA) and currently Senior Director of Technology for Citrix XenServer.

Over the years XenRT has evolved from an automated test system comprising a few tens of automated test cases running on a few servers, with the emphasis very much on regression testing core Xen hypervisor functionality, to a highly sophisticated test-as-a-service platform. The current incarnation of XenRT within Citrix runs test jobs on demand from the XenServer engineering group. Tests run on a huge distributed lab comprising a wide variety of hardware types. A team of some twenty-five developers in the UK and India spend the majority of their time either maintaining and operating  XenRT or developing new automated test cases for XenRT. However one mark of its success as a test-as-a-service platform is that it is commonplace for test cases to also be developed and run by the wider XenServer engineering community within Citrix. Each additional test case, once developed, is added to the ever-growing inventory of test cases available to the XenServer team.

Any engineer can use XenRT to request that some set of test cases (ranging from a single test case right up to any subset of the several thousand available) be run, and that results be emailed back to them. Each such request  is scheduled by XenRT whereupon it is allocated a job ID and is fully trackable during execution.

In August 2013 we have celebrated the 500,000 XenRT job!

Architecture

The core of XenRT is the scheduler. This is a piece of software (like all XenRT code, written in Python) which schedules requests from users, mapping them onto available lab hardware. Some jobs will require particular hardware – e.g. a specific CPU type or storage array. In the general case jobs will be scheduled to run on whatever hardware is first available that meets the constraints specified in the job. One beneficial side-effect of this is that a given test case, unless it has very specific constraints, will, over multiple runs, be exercised on different server types and hardware combinations, thus providing a degree of hardware diversity in the testing. This helps address one of the many challenges with testing an operating system like XenServer – that of validating its operation across the many different supported hardware types.

What we call the “XenRT lab” is in fact multiple physical labs distributed across several geos. The lab is further logically divided into “XenRT sites”, most sites comprising 16 servers, some comprising as many as 48. Each site is managed by a dedicated server known as a “XenRT Controller”. When scheduled, a job is handed over to the relevant XenRT controller.  When it runs it first goes through a setup phase where it orchestrates the requested test configuration on the host server or pool – this typically involves installing the required version of XenServer, installing the required guests and configuring the network and storage. Once set up, the tests themselves are executed, test logs and results are recorded to a database and defects are auto-generated in the XenServer defect tracking system (including automated de-duplication).
Some jobs are requests by individual developers or test engineers to run a particular test case. Some jobs are standard regression test suites, comprising hundreds or even thousands of test cases. Many of these suites are run continually to provide an ongoing quality view of the XenServer mainline code branch. The net effect is that the XenRT lab has a very high utilisation, effectively in use 24×7, 365 days per year.

As well as functional test cases, XenRT has suites of test cases that measure XenServer performance characteristics, that test stability and performance characteristics at large scale (thousands of guests running across large XenServer pools), that test XenServer’s interoperability with other Citrix products such as XenDesktop, PVS and NetScaler and that test XenServer in continuous long term use. The XenRT codebase is object-oriented with lots of re-usable library code for common tasks in orchestrating and operating XenServer environments. This leads to an ever-decreasing cost of development of new test cases.

Summary

XenRT is a key tool for XenServer engineering – it provides the XenServer team with a continuous integration capability ; it ensures rigorous regression testing of functional and non-functional system characteristics ; it is 100% automated ; it gets the most out of expensive capital equipment by ensuring very high lab utilisation ; it provides a rich set of queryable logs and metrics ; it provides rich lab management and orchestration features. All of this allows the XenServer team to makes releases in a considerably shorter timescale and more efficiently than would otherwise be the case. Achieving a similar level of test coverage purely through manual testing would be prohibitively expensive.

Where next?

XenRT will remain a key part of the QA strategy for Citrix XenServer. However the next step on the XenRT journey is to allow the xenserver.org community to benefit from the same test-as-a-service characteristics that the Citrix XenServer team enjoys. Watch out for an imminent blog on xenserver.org giving more details of this exciting development!