I’ve been collaborating with Cisco, Microsoft, and NetApp solution architects to create a Citrix XenDesktop reference implementation that customers can deploy as a complete VDI solution. We are focused on scalability testing of this joint solution stack: Citrix XenDesktop 5.6 and Provisioning Services (PVS) 6.1 using Microsoft Private Cloud on Cisco FlexPod, which incorporates Cisco UCS blade servers, switches, and NetApp storage. The goal of our testing was to build and validate a 2000-user XenDesktop environment, confirming usability of an enterprise-level solution at that scale.
In examining workload scalability, we achieved density in the reference implementation design. Test results showed linear scalability, supporting 145 users on a single blade using the FlexPod configuration and scaling up to 2000 users on fourteen blades hosting desktop environments. The reference architecture is fully documented, along with testing procedures and results, as a Cisco Validated Design (CVD) named “Citrix XenDesktop on FlexPod with Microsoft Private Cloud.” (For detailed reports, see the Cisco and Microsoft URLs at the end of this blog article).
The figure above depicts the validated configuration and its components:
Cisco FlexPod, a predesigned configuration built on Cisco UCS servers, Cisco switches and fabric interconnects, and an integrated set of storage and software components. The reference implementation consists of two chassis units that house fourteen Cisco B230 M2 half-width, dual-socketed blades.
- Microsoft Private Cloud, including Microsoft System Center 2012, Microsoft Hyper-V Server 2008, and Windows 7 Enterprise for the virtual desktops.
- NetApp FAS 3240 dual-controller storage system, configured with 600 GB, 10,000 RPM SAS drives for Fibre Channel LUNs and 256 GB of intelligent flash memory cache.
- Citrix XenDesktop 5.6 configured with redundant desktop controllers, along with three instances of Provisioning Server (PVS) 6.1. PVS creates a single desktop operating system image (vDisk) that can be streamed to multiple desktops. XenDesktop was used to deploy virtual machines to support all infrastructure components in addition to user desktop virtual machines.
Since the solution is based on the modular FlexPod platform that integrates computing, networking, and storage elements, it can scale up or out to match site and project performance goals. You can add more resources to each individual FlexPod chassis or deploy additional FlexPod units to match workloads as user populations and performance needs increase.
Sizing and Configuration Details
In designing a VDI solution that scales and still delivers reasonable response times, it’s important to configure a sufficient number of IOPS across three key operational phases: virtual machine boot, ramp-up (the period when users log in), and ultimately the steady state of typical user operations. In our testing, we used the following VDI sizing guidelines for “Normal” users, allowing a single virtual CPU, 1.5GB of memory, and approximately 8-12 IOPS (for steady state performance) for each Windows 7 VM. Note that these are general-purpose sizing guidelines that should be adapted for the specific hardware and workloads in each environment.
To optimize I/O performance, we created volumes symmetrically across both NetApp controllers, implementing a combination of block-based volumes (for UCS server boot and Hyper-V write cache LUNs) and iSCSI volumes (for clustered shared volumes). The iSCSI volumes offer a performance advantage since they are not subject to block misalignment that sometimes adversely impacts VDI performance. For VDI workloads in which a single master image serves many desktops, flash-based cache in the NetApp design helps to reduce latency. NetApp also uses Write Anywhere File Layout (WAFL) technology that enhances performance. WAFL buffers random write operations, converting a group of random write I/Os to a single sequential write from cache to disk. As configured, the storage system delivered the required IOPS per desktop, providing comparable I/O performance to NFS-based VDI storage solutions.
To understand the test methodology used, you can view this short, 4½-minute video.
To understand the test methodology used, you can view this short, 4½-minute video.
To conduct a test run, we performed this sequence of steps, which are recorded in the video:
We started performance monitoring scripts to capture comprehensive subsystem metrics. The scripts recorded performance statistics for all infrastructure components (Hyper-V hosts, Desktop Controllers, PVS servers, client launchers, and NetApp controllers — basically all test environment components but the virtualized desktops themselves). We also recorded start and stop times in a time log.
Initially the 2000 desktops were powered off. We took the desktops out of maintenance mode, which caused the XenDesktop Controller to communicate with the Virtual Machine Manager and begin the VM power-up process. With 2000 desktop VMs provisioned across 14 blade servers, the Desktop Controller booted all of the desktops in about 30 minutes. Note: Powering on the desktop VMs is the first phase of the test.
Next, we started the Login VSI (www.loginvsi.com) portion of the test. Login VSI is a load generation tool for VDI benchmarking that simulates activity of production users. Using the Login VSI console, we selected the default Medium workload. This workload represents a “Normal” knowledge worker (a less-intensive workload than that of a “Power” user). We started the Login VSI test, which launched 2000 desktop sessions and simulated user login on each. Once all 2000 users were logged in, the steady state portion of the test began. Note: Logins, or ramp-ups, are the second phase of testing followed by the third, which is steady state.
In the steady state part of the test (when all sessions were active), Login VSI simulated an office productivity workload (Office 2010, Internet Explorer with a Flash video applet, and Adobe Acrobat Reader). Login VSI tracks user experience statistics, looping through specific operations and measuring response times at regular intervals. Response times are used to determine Login VSIMax, the maximum number of users that the test environment can support before performance degrades consistently. In our testing, we defined success criteria as application response times lower than 4000ms (over the baseline) measured less than six times consecutively. For this validation, VSIMax scores were not reached.
While Login VSI looped through its workload, we monitored test run progress using the Desktop Controller and Login VSI consoles. The Login VSI sessions started logging off after a specified amount of steady state time had elapsed. Note: Logoffs are the final phase of the test.
After all Login VSI sessions were logged out, we stopped the performance monitoring scripts and closed the logs.
The reference implementation achieved impressive results in our test runs, especially in comparison to previous testing with earlier generation Cisco blades. As a matter of fact, the tested configuration reduced the rack space needed to support 2000 users, decreasing it from 30 Rack Units (RUs) to 12 RUs. This density can help to lower power use, which directly translates into energy savings.
The results specifically showed that the Cisco UCS B230 M2 half-width blade (featuring dual-socket, 10-core processors and 256GB of memory) can support medium workloads up to 145 users per blade. A single FlexPod VDI design containing two Cisco UCS chassis units with 14 such blades can support at least 2000 virtual desktop workloads while maintaining acceptable response times. The results clearly demonstrated linear scalability of desktop workloads for this reference architecture. Detailed metrics are given below.
Single Blade Scalability Results
The graph below shows that the configuration scaled easily to support 145 users on a single blade while maintaining acceptable reponse times.
The following graphs show CPU utilization, memory consumption, network traffic (received and sent), read and write IOPS, respectively, for our test of up to 145 users on a single blade. Because the other core infrastructure components were sized proportionally, CPU utilization was the gating factor for determining desktop host servers’ capacity.
Multiple Blade Scalability Results
In the second phase of FlexPod testing with 14 blades in 2 chassis, we saw linear scalability with up to 2000 virtual desktop workloads. The graph below shows that VSIMax was not reached but that the single FlexPod configuration scaled easily to support 2000 users.
The following graphs show CPU utilization, memory consumption, and read and write IOPS, respectively, for our test of up to 2000 users on 14 blades. We reached a peak CPU utilization of about 90% during this test run. Again, this indicates how the server’s processors can be fully utilized given sufficient core infrastructure resources.
While you should take into account all relevant configuration factors — user populations, user workload types, etc. — to design an actual production deployment, the testing validates the high density and linear scalability of this joint Citrix, Microsoft, Cisco, and NetApp solution. For more details on the tests and resulting metrics, see the reference architecture documents listed below.