When coupled with HP ProLiant servers, Citrix VDI-in-a-Box creates a low-cost cookie-cutter approach for rapidly deploying VDI environments. Together, these products form a turnkey-like appliance that makes VDI deployment remarkably easy — the challenge is simply determining the number and type of HP ProLiant servers needed to match user capacity and workload requirements.

To take the guesswork out of capacity planning and sizing, I worked with HP’s Client Virtualization Architect, Kirk Manzer, to validate specific HP server and Citrix VDI-in-a-Box configurations. We wanted to come up with scalability guidelines to help customers predict capacity and identify how many users could be supported with acceptable response times under a defined workload. The white paper titled “HP Client Virtualization SMB Reference Architecture for Citrix VDI-in-a-Box” contains details about the testing and the data I collected.

Test Environment Overview

The test environment included these core components:

•      Citrix VDI-in-a-Box. Citrix VDI-in-a-Box is a virtual appliance that runs on a hypervisor-enabled server to deliver centrally managed virtual desktops. It is a complete VDI solution designed to scale simply by adding additional units, and features built-in connection brokering and load balancing across multiple servers. An N+1 configuration provides a highly available solution — if a server fails, Citrix VDI-in-a-Box fails over to the remaining server to host the virtual desktops. Citrix VDI-in-a-Box takes advantage of the Citrix Receiver and the HDX protocol to support a variety of device types, including smartphones, tablets, thin clients, laptops, and workstations.

•      HP ProLiant DL380p Gen8 Server.  In the testing, I used two different configurations of HP ProLiant servers, designing them to support capacities of 50 and 100 users. The testing served to validate server configurations at these capacities. The local storage configuration (based on an HP Smart Array 6 Gb/s PCIe 3.0 SAS controller and eight 10,000-RPM SAS disks configured as RAID 0+1 volumes) was the same for both servers; only the compute and memory configurations differed. To support 50 users, I configured a server with a single Intel Xeon E5-2680 CPU @ 2.70GHz (8 cores, 16 threads) and 96 GB of memory, while the 100-user configuration had twice the CPU resources — two Intel Xeon E5-2680 CPUs — and 128 GB of memory. (For a 100-user production environment, Citrix recommends 192GB RAM to ensure adequate memory resources.)

•      Microsoft Server 2008 R2 SP1 Hyper-V. VDI-in-a-Box runs on Microsoft Hyper-V, Citrix XenServer, and VMware vSphere hypervisors. I selected Microsoft Hyper-V for the tests.

The test harness I used included Login VSI (www.loginvsi.com), a load generation tool for VDI benchmarking that simulates production user workloads. For this testing, I selected the default Medium workload to simulate the desktop activity of a typical knowledge worker. Login VSI generates an office productivity workload that includes Office 2010 with Microsoft Outlook, Word, PowerPoint, and Excel, Internet Explorer with a Flash video applet, Java app, and Adobe Acrobat Reader.

Configuration Nuances

From other performance testing that I’ve done, it’s clear that maintaining a balance of compute, memory, and I/O resources is critical to scale the number of desktops effectively. In these tests, having twice the compute and memory resources to address twice the user capacity seemed to be a reasonable starting point to achieve acceptable performance. In sizing production configurations, the type of application workload is a significant impact, so be sure to use the results here as a general guideline rather than a
strict rule-of-thumb.

In the test configurations, Citrix VDI-in-a-Box takes advantage of inexpensive local storage rather than complex shared SAN or NAS configurations. One variable in designing a production environment is just how much storage capacity is required, which will vary according to the number and size of the desktop images, the swap space configured, and extra space for growth. VDI-in-a-Box supports a one-to-many imaging approach that conserves disk space — desktops share a master golden image and the software tracks individual differences from that master. You can estimate disk space requirements based on the number and size of your golden images and the number of desktops that will be created from each.

Although scalability (rather than availability) was the focus of my testing, it’s important to point out that integral solution features enable highly available desktop services. In a deployment with N+1 servers, Citrix VDI-in-a-Box uses a shared-nothing architecture so that all servers are equally replaceable and function as peers to one another. As long as you pad the deployment with extra server capacity, if one server fails, another server can continue to provide virtual desktops for your environment.

Test Methodology

For each test run, I followed this sequence of steps:

1) Using the VDI-in-a-Box Management Console, I verified that all desktops were powered up and in an idle “Hold” state.

2) I restarted the VSI launchers and verified that they were ready for testing.

3) I started a script that invoked PerfMon scripts to capture comprehensive system performance metrics.

4) With the desktops powered up and idle, I initiated the workload simulation portion of the test using Login VSI. Depending on the test run, either 50 or 100 desktop sessions were launched and Login VSI simulated user logins on each.

Once all users were logged in, the steady state portion of the test began in which Login VSI tracked application perfomance statistics, looping through specific operations and measuring response times at regular intervals. Response times are used to determine Login VSIMax, the maximum number of users that the test environment can support before performance degrades consistently. In the testing, I defined success criteria as application response times lower than 4000ms (over the baseline) with exceptions occurring less than six times consecutively.  For these initial test runs of 50 and 100 users, VSIMax scores were not reached.

5) While Login VSI looped through its workload, I monitored test run progress using the VDI-in-a-Box Manager and Login VSI consoles. (The launched-to-active desktop ratio shouldn’t fall behind more than 2 or 3 desktops for single server testing.) After a specified amount of elapsed steady state time, Login VSI started to log off the desktop sessions.

6) After all sessions were logged off, I stopped the performance monitoring scripts.

7) Lastly, I processed the Login VSI logs using VSI Analyzer and PerfMon CSV using PAL to analyze the test results.

Test Results

The VDI configurations that I tested showed linear scalabilityfor 50 or 100 users under a Login VSI Medium workload. Given the scaled compute and memory resources in the HP ProLiant server, I was able to scale linearly from 50 to 100 users while maintaining acceptable response times without exhausting subsystem resources. Detailed results for the test runs are given below.

 

50-User Results

The following graphs show Login VSIMax, CPU utilization, memory consumption, network traffic (received and sent), and calculated read and write IOPS, respectively, for this test configuration. The graphs show that the single Xeon E5-2680 CPU with 8 physical cores/16 threads and 96 GB of memory provided sufficient resources to maintain acceptable response times throughout the 50-user test.

The 50-user test configuration followed these general metrics:

·     Desktop to physical core ratio: 6.25 to 1

·     Memory per desktop: 1GB per desktop was tested, VSI Medium workload consumes ~800MB per desktop during peak (Minimally 1.5GB per desktop is recommended with an additional allowance for overhead for the hypervisor OS)

·      Storage read/write ratio

Average: 47/53 reads/writes

Max: 58/42 reads/writes

·       Storage IOPS

Per Desktop Average: ~10.04

o       Per Desktop Max: ~39.18

It’s important to note that the storage data above reflects all phases of testing (logons, steady state, and logoffs). While you can use these results as general sizing guidelines, remember that you must tailor every configuration to match specific workload types and user capacities.

 

Login VSI Data (no VSIMax)

The graph below shows that the configuration scaled easily to support 50 users while maintaining acceptable response times. Response times never exceeded 4000 ms more than six times consecutively, so the 50-user configuration never reached Login VSIMax.

Login VSIMax, the maximum capacity of the tested system expressed as a number of Login VSI sessions, is derived from the Login VSI Analysis Tool. Within each workload test loop, the response times of seven specific operations are measured at a regular interval, six times within each loop. The response times of these seven operations are used to establish VSIMax.

CPU (Single, 16 logical cores)

This metric records utilization of physical processors in the host computer. The “\Hyper-V Hypervisor Logical Processor(*)\% Total Run Time” performance counter is more accurate than using the system’s “% Processor Time” counter because the “% Processor Time” counter only measures host processor time. The “\Hyper-V Hypervisor Logical Processor(*)\% Total Run Time” performance counter is the best counter to use to analyze overall processor utilization of the Hyper-V server.

Memory (96GB Total)

Available MBytes is the amount of physical RAM, in megabytes, immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free, and zero page lists. If this counter is low, then the computer is running low on physical RAM.

Storage IOPS

The Calculated IOPS metric represents the combined rate of read and write operations on the C: disk.

Network

Network Interface Bytes Total/second is the combined rate at which bytes are sent and received over each network adapter, including framing characters.

 

100-User Results

The following graphs show Login VSIMax, CPU utilization, memory consumption, network traffic (received and sent), calculated read and write IOPS, respectively, for the 100-user configuration. The graphs show that the HP ProLiant server with two Xeon E5 2680 CPUs (16 physical cores/32 threads total) and 128 GB of memory provided sufficient resources to maintain acceptable response times throughout the test.

As a guideline in sizing, the 100-user test generated these metrics:

·     Desktop to cores ratio: 6.25 to 1

·     Memory per desktop: 1GB per desktop was tested, VSI Medium workload consumes ~800MB per desktop during peak (Minimally 1.5GB per desktop is recommended with an additional allowance for overhead for the hypervisor OS)

·      Storage read/write ratio

Average: 42/58 reads/writes

Max: 55/45 reads/writes

·      Storage IOPS

o Per Desktop Average: ~9.2

o   Per Desktop Max: ~23.54

It’s important to note that the storage data above reflects all phases of testing (logons, steady state, and logoffs). Remember that these results are general guidelines and configurations should match specific workload types and user capacities.

Login VSI Data (no VSIMax)

The graph below shows that the configuration scaled easily to support 100 users while maintaining acceptable response times under a Medium Login VSI workload. Once again, response times did not exceed 4000 ms more than six times consecutively, so the 100-user configuration did not reach Login VSIMax, the maximum capacity expressed as a number of Login VSI sessions. To establish VSIMax, the Login VSI Analysis Tool records response times of seven operations within each workload test loop. 

CPU (Dual, 32 logical cores)

This metric records overall processor utilization of the Hyper-V server.

Memory (128GB Total)

This metric indicates the available amount of physical RAM, in megabytes.

Storage IOPS

Calculated IOPS shows the rate of read and write operations.

Network

This metric is the rate at which bytes are sent and received over each network adapter, including framing characters.

 

 

Summary

While you should take into account all relevant configuration factors — user workload types, availability needs, etc. — when sizing a production deployment, my testing validated the reference architecture 50- and 100-user configurations. The results demonstrated linear scalability of this HP ProLiant and Citrix VDI-in-a-Box solution under a Medium Login VSI workload. As CPU and memory resources scaled, the user population scaled linearly without any significant degradation in response times. For more details on the testing and results, see the reference architecture report.

Since Login VSIMax (the point at which performance starts to degrade consistently) was never reached in either the 50- or 100-user test runs, I wondered at what point performance would start to degrade. To better understand capacity limitations, I ran a subsequent series of tests that you will be able to read about in parts 2 and 3.

 

References

•      HP Client Virtualization SMB Reference Architecture for Citrix VDI-in-a-Box (PDF)

•      HP and Citrix VDI-in-a-Box: Part 2 – Pushing the Limits (Blog)

•      HP and Citrix VDI-in-a-Box: Part 3 – User Experience Testing (Blog)