Follow me @TonySanchez_CTX

Architectures—whether physical or virtual—should be flexible enough to adapt to different workloads, allowing them to support changing business needs. Although implementing a new IT architecture takes time and careful planning, the process to test and validate an architecture should be easy. In the case of a virtual desktop architecture, test engineers should be able to follow a repeatable pattern, step by step, simply changing out the workload to validate the architecture under different anticipated user densities, application workloads, and configuration assumptions. The procedure should be as easy as learning a new series of dance steps (think PSY’s Gangnam Style, the most watched dance video on YouTube). The point causes me as a test engineer to ask the question: in the case of VDI, why can’t a hypervisor simply learn a new workload just like I might learn a new sequence of dance steps?

Luckily for test engineers, Citrix FlexCast® provides the ability to learn and deliver any workload type by leveraging the power of the Citrix Provisioning Services® (PVS). Recently I worked with engineers from Citrix and Dell, collaborating to build a FlexCast reference architecture for deploying XenApp® and XenDesktop® on Hyper-V on a Dell infrastructure. Testing of this reference architecture looked at how XenApp and XenDesktop performed under various workloads, altering hypervisor configuration settings and examining the overall user experience and user densities. At the drop of dime, FlexCast and PVS enabled a simple switch of the architecture to a new workload.

Based on that reference architecture effort, we recently began a Single Server Scalability (SSS) test using the latest hardware and software releases available. This blog focuses on that effort — what I call the “XenApp dance step for FlexCast style” and how XenApp workloads perform on Hyper-V. (A follow-on blog article will focus on an alternate “dance” sequence for XenDesktop.) The focus of this blog is how the configuration of the McAfee virus scanning software can impact performance and scaling.

In previous blogs, I describe the testing process and methodology that leverages the Login VSI test harness, along with key tips for success. Since those same methods and recommendations apply here, let’s review the configurations we used for this scalability testing as well as the workloads and actual test results. If you’re not familiar with Login VSI please take some time to review how their products can help you with application and hardware scalability testing by visiting www.loginvsi.com.

For background reading, I highly recommend that you review Frank Anderson’s post on XenApp physical versus virtual testing results with Hyper-V. Frank is my colleague and a great resource for insights about testing, including implementation tips and general best practices. In addition, the related Dell and Citrix white paper describing the FlexCast reference architecture for deploying XenApp and XenDesktop on Hyper-V is available here.

 

With these resources in mind, let’s get right down to the business of learning the “dance steps” and presenting our test results.

McAfee Exclusions for XenApp

For XenApp to provide optimal performance, scalability, and a responsive user experience, defining exclusions from McAfee virus scanning are a necessity. Configuration parameters specify whether virus scanning should occur when reading or writing from disk, as shown in the configuration screenshot below.

        

Some highly secure environments require that files are scanned both when reading and writing to disk, while others only require scanning on writes. This blog describes test results that compare the performance and scalability of configurations when scanning is performed on reads and writes versus when it is performed on writes only.

Compute Host Hardware

For the compute host building block, we chose a Dell R720 server configured as follows:

 

Below is a diagram depicting the XenApp compute host building block. The Hyper-V host was constructed with a single RAID 1 volume for the OS and a RAID 10 volume for virtual machine write cache and BIN file storage. Windows Server 2008R2 SP1 with Hyper-V requires a BIN file for each virtual machine. The BIN file is used to save the operating system running state of the client-side virtual machine.


 

 

 

 

 

To estimate the total amount of storage needed on the compute host, then, we need to take into account the local PVS write cache and the BIN file for each virtual machine. The BIN file is equal in size to the paging file set at the master image for the operating system level.  Let’s consider an example:

Example: Suppose we want to build XenApp VMs on a host running Windows Server 2008R2 with Hyper-V. The equation to calculate the amount of storage needed for all VMs on the host is:

Total size of all Write Cache VHD files + Total size of all BIN files = total storage space required

In a case where XenApp desktops are created with a 10GB write cache partition, BIN file storage would require an additional 10GB, for a total local disk storage of 20GB per VM. In this case if we want to build 10 XenApp VMs on a host with Windows Server 2008R2 SP1 with Hyper-V, a total of 200GB of local storage is required. It is important to note that the BIN file is created to its full size when a virtual machine is powered on and deleted when powered off.  The BIN file can potentially increase IOPS on the host also, so proper planning is important when planning BIN file locations.

Note: Windows Server 2012 Hyper-V introduces a new feature as part of the  Automatic Stop Action which does not require that every Windows guest machine keep its running state which decreases the architecture storage requirements. With this newer release, the BIN file is no longer a hard requirement while conserving storage space and boosting performance since fewer IOPS are required. An example of this new feature for Windows Server 2012 is shown below. Ben Armstrong from Microsoft has a great blog that you can read about here to learn more about this feature.

 

Management Host Hardware

For the management host building block, a Dell R720 server was configured as follows:

 

 

For management functionality, we used one compute host to run the core infrastructure roles. The virtual machines for the infrastructure roles were stored on a dedicated Dell Equalogic PS4100XV SAN in a Microsoft Cluster Shared Volume (CSV) so that they could be migrated to a new host if needed. The infrastructure VMs are as follows:

  • DC
  • SCVMM 2012
  • SCOM 2012
  • Desktop Controller
  • Web Interface
  • Zone Data Collector
  • SQL
  • PVS
  • Citrix Licensing
  • File Server

 

 

 

 

 

 

 

 

 

 

 

 

Provisioning Server (PVS)

Provisioning Server 6.1 was used with the appropriate hotfixes. We also applied the best practices guide for PVS with the appropriate network tweaks. The PVS VM was assigned 4 vCPUs and 24GB of RAM.  The PVS write cache for the master XenApp image was set to a fixed 20GB VHD and was also set to client-side. The PVS wizard was used to create all 10 XenApp VM images. McAfee Antivirus 8.7 was installed and configured at the PVS VM and the best practice guide was applied to ensure that the proper exclusions for the vDisks and other critical pieces for PVS were in place. The best practice guide for McAfee on PVS can be found here.

System Center Virtual Machine Manager 2012

System Center Virtual Machine Manger 2012 was leveraged for VM management at the Hyper-V compute host level. No service pack was applied from System Center 2012 Virtual Machine Manager. The PVS write cache was also stored in the VMM library and attached to master XenApp VMM template. McAfee Antivirus 8.7 was installed and configured at the compute host and the best practice guide was applied to ensure that the proper exclusions for the virtual machines were in place. The best practice guide for McAfee on Virtual Machine Manager can be found here.

Hyper-V Compute Host

For the compute host building block, the default installation of Windows Server 2008R2 Enterprise with SP1 was applied and the role of Hyper-V was enabled. We created two virtual networks in Hyper-V: one network for management and the other for public PVS stream traffic. All the appropriate Windows Server and Hyper-V hotfixes were applied via Windows Update. Two logical drives were created for the Hyper-V host: volume 0 was for the operating system and volume 1 for write cache storage. McAfee Antivirus 8.7 was installed and configured at the compute host and the best practice guide was applied to ensure that the proper exclusions for the virtual machines were in place. The best practice guide for McAfee on Hyper-V can be found here.

XenApp 6.5

XenApp 6.5 Platinum was installed and configured for the master image. All XenApp public Hotfix Rollup Packs were applied. The PVS client was installed as part of the default install of XenApp. Note that, by default, the XenApp 6.5 ISO image only installs the PVS 5.6 SP1 client. While the 5.6 SP1 client can connect to a PVS 5.6 or 6.1 farm, it is recommended to keep the PVS client and the PVS server at the same release version. If you choose not to use the PVS 5.6 SP1 client bundled with the XenApp 6.5 ISO, uncheck the PVS target device box during installation and mount the PVS 6.1 ISO to the XenApp VM after XenApp installation to install the PVS client that matches the server version.

McAfee Antivirus 8.7 was also installed and configured on the XenApp master image. The best practice guide was applied to ensure that the proper exclusions for the virtual machines were in place. The best practice guide for McAfee on Hyper-V can be found here.  As the test results in this blog article show, the settings for McAfee virus scanning applied to XenApp (write versus read/write) greatly impact performance and scalability.

Application Workloads

During the test design process, we wanted to create a realistic application workload. With that in mind, we tried to leverage the most commonly used applications on XenApp that would be typically found in production. The following applications were among those included in the test workload:

  • Microsoft Office 2010
  • Adobe Reader
  • Adobe Flash
  • McAfee Antivirus 8.7

Login VSI 3.6 was the workload generator. Test configurations focused on two specific workloads: a Basic user (VSI Light) workload and a Standard user (VSI Medium) workload. Both sets of tests leveraged HDX flash technology, but the flash rendering parameter was set to server-side, which can be changed at any time. 

Workload Independent Settings

The table below summarizes the configuration of the compute host environment to support both XenApp task and knowledge worker scenarios.

 

 

 

 

 

 

Although 128GB of memory was configured during the XenApp performance analysis, it is possible to configure the server with 96GB and provide minimal resources for each VM.

Task Worker (Basic Workload) – McAfee Writes Only

The Basic workload runs a small number of applications that are representative of applications used by task workers (such as call center employees). The applications are closed immediately after use, resulting in relatively low memory and CPU consumption in comparison to the Standard workload. Applications in the Basic workload include Internet Explorer, Microsoft Word 2010, and Microsoft Outlook 2010, with only two of these applications running simultaneously. User idle time is approximately 17% of total run time.  In this test the McAfee when writing to disk only box was selected for the master XenApp image.

Configuration Summary – Task Worker

The table below gives the parameters for virtual desktop VMs configured for a task worker.

CPU Resource Utilization Performance Analysis Results

The CPU graphs below show logical and virtual processor utilization for the compute host under the Basic workload. Hyper-V provides hypervisor performance objects to monitor the performance of both logical and virtual processors. A logical processor correlates directly to the number of processor cores that are installed on the physical computer. (For example, two quad-core processors installed on the physical computer correlates to 8 logical processors.) Virtual processors are what the virtual machines actually use, and all execution in the root and child partitions occurs in virtual processors.

The results below show sustained logical processor % runtime, peaking at approximately 90%. Logical processor % runtime is the key parameter for performance analysis of guest operating systems. A peak of 80% is recommended to optimize density while providing sufficient headroom to ensure that the end-user experience does not diminish. A high logical processor % runtime combined with a low virtual processor % runtime is typical of an environment where there are more processors allocated to VMs than are physically available on the compute host, which is the case for this VDI environment.

 

 

 

 

 

 

Memory Resource Utilization Performance Analysis Results

The memory graph shows the amount of memory used on the host for the Hyper-V host operating systems with 8 XenApp machines active for the duration of the test under the Basic workload. Although 128GB of memory was configured during the XenApp performance analysis, it is possible to configure the server with 96GB and provide minimal resources. The diagram illustrates that additional memory is available in the environment—maximum memory utilization is 92%.

 

Disk I/O Resource Utilization Performance Analysis Results

The Disk I/O and latency graphs below The average IOPS during the duration of the test when all user sessions were connected and active was 1,021,with an average of 4.6 IOPS per XenApp user. It should be noted that these disk I/O figures are for the compute host D: drive where the virtual desktops reside—disk activity on the C: drive is minimal.

 

 

 

 

 

 

 

 

Knowledge Worker (Standard Workload) – McAfee Writes Only

The Standard workload runs applications representative of those used by knowledge workers (such as accountants). The applications in this workload are Internet Explorer, a number of Microsoft Office 2010 applications (Excel, Outlook, PowerPoint, and Word), Adobe Acrobat Reader, Bullzip PDF printer, and 7-zip file compression software. Relative to the task worker (Basic user) workload discussed previously, idle time is slightly lower as a percentage of overall runtime and a maximum of 5 applications are open simultaneously (compared to 2 open applications for a task worker).

Configuration Summary – Knowledge Worker

The table below gives XenApp configuration parameters for the virtual shared desktop VMs used for a knowledge worker.

CPU Resource Utilization Performance Analysis Results

The CPU graphs below illustrate logical processor utilization under the Standard workload. The results show sustained logical processor % runtime peaking at approximately 98%. The sustained peak logical processor % runtime of 80% is ideal for an environment that optimizes density while providing sufficient headroom to ensure that the end-user experience does not diminish. A high logical processor % runtime combined with a low virtual processor % runtime is typical of an environment where there are more processors allocated to VMs than are physically available on the compute host, which is the case for this VDI environment. Knowledge worker workloads tend be more strenuous on logical processors than task worker applications, thus causing a decrease in user density.

 

 

Memory Resource Utilization Performance Analysis Results

The memory graph below shows the amount of memory used on the host for the Hyper-V host operating systems with 8 XenApp machines active for the duration of the Standard workload test. Although 128GB of memory was configured during the XenApp performance analysis, it is possible to configure the server with 96GB and provide minimal resources to each VM. The diagram illustrates that memory availability exists in the environment—the maximum memory utilization is 91%.

 

 

Disk I/O Resource Utilization Performance Analysis Results

The Disk I/O and latency graphs shown below illustrate that the Hyper-V host environment is performing to expected levels for a knowledge worker. The average IOPS during the duration of the test when all user sessions were connected and active was 1,165 with an average of 4.4 IOPS per XenApp user. It should be noted that these disk I/O figures are for the XenApp VM. These disk I/O figures are for the compute host D: drive where the virtual desktops reside—disk activity on the C: drive is minimal.

 

 

Task Worker (Basic Workload) – McAfee Read and Writes Only

The Basic workload runs a small number of applications that are representative of applications used by task workers (such as call center employees). The applications are closed immediately after use, resulting in relatively low memory and CPU consumption in comparison to the Standard workload. Applications in the Basic workload include Internet Explorer, Microsoft Word 2010, and Microsoft Outlook 2010, with only two of these applications running simultaneously. User idle time is approximately 17% of total run time.  In this test the McAfee when reading and writing to disk only box was selected for the master XenApp image.

Configuration Summary – Task Worker

The table below gives the parameters for virtual desktop VMs configured for a task worker.

CPU Resource Utilization Performance Analysis Results

The CPU graphs below show logical and virtual processor utilization for the compute host under the Basic workload. Hyper-V provides hypervisor performance objects to monitor the performance of both logical and virtual processors. A logical processor correlates directly to the number of processor cores that are installed on the physical computer. (For example, two quad-core processors installed on the physical computer correlates to 8 logical processors.) Virtual processors are what the virtual machines actually use, and all execution in the root and child partitions occurs in virtual processors.

The results below show sustained logical processor % runtime, peaking at approximately 87%. Logical processor % runtime is the key parameter for performance analysis of guest operating systems. A peak of 80% is recommended to optimize density while providing sufficient headroom to ensure that the end-user experience does not diminish. A high logical processor % runtime combined with a low virtual processor % runtime is typical of an environment where there are more processors allocated to VMs than are physically available on the compute host, which is the case for this VDI environment.

 

 

Memory Resource Utilization Performance Analysis Results

The memory graph shows the amount of memory used on the host for the Hyper-V host operating systems with 7 XenApp machines active for the duration of the test under the Basic workload. Although 128GB of memory was configured during the XenApp performance analysis, it is possible to configure the server with 96GB and provide minimal resources. The diagram illustrates that additional memory is available in the environment—maximum memory utilization is 84%.

 

 

 

Disk I/O Resource Utilization Performance Analysis Results

The Disk I/O and latency graphs below illustrate that the Hyper-V host environment is performing to expected levels for a task worker. The average IOPS during the duration of the test when all user sessions were connected and active was 839 with an average of 4.2 IOPS per XenApp user. It should be noted that these disk I/O figures are for the compute host D: drive where the virtual desktops reside—disk activity on the C: drive is minimal.

 

 

 

Knowledge Worker (Standard Workload) – McAfee Read and Writes Only

The Standard workload runs applications representative of those used by knowledge workers (such as accountants). The applications in this workload are Internet Explorer, a number of Microsoft Office 2010 applications (Excel, Outlook, PowerPoint, and Word), Adobe Acrobat Reader, Bullzip PDF printer, and 7-zip file compression software. Relative to the task worker (Basic user) workload discussed previously, idle time is slightly lower as a percentage of overall runtime and a maximum of 5 applications are open simultaneously (compared to 2 open applications for a task worker).

Configuration Summary – Knowledge Worker

The table below gives XenApp configuration parameters for the virtual shared desktop VMs used for a knowledge worker.

 

CPU Resource Utilization Performance Analysis Results

The CPU graphs below illustrate logical processor utilization under the Standard workload. The results show sustained logical processor % runtime peaking at approximately 95%. The sustained peak logical processor % runtime of 80% is ideal for an environment that optimizes density while providing sufficient headroom to ensure that the end-user experience does not diminish. A high logical processor % runtime combined with a low virtual processor % runtime is typical of an environment where there are more processors allocated to VMs than are physically available on the compute host, which is the case for this VDI environment. Knowledge worker workloads tend be more strenuous on logical processors than task worker applications, thus causing a decrease in user density.

ReadsWrites-Standard-CPU

 

 

Memory Resource Utilization Performance Analysis Results

The memory graph below shows the amount of memory used on the host for the Hyper-V host operating systems with 7 XenApp machines active for the duration of the Standard workload test. Although 128GB of memory was configured during the XenApp performance analysis, it is possible to configure the server with 96GB and provide minimal resources to each VM. The diagram illustrates that memory availability exists in the environment—the maximum memory utilization is 91%.

 

 

Disk I/O Resource Utilization Performance Analysis Results

The Disk I/O and latency graphs shown below illustrate that the Hyper-V host environment is performing to expected levels for a knowledge worker. The average IOPS during the duration of the test when all user sessions were connected and active was 792, which represents an average of 5 IOPS per XenApp user. It should be noted that these disk I/O figures are for the XenApp VM. These disk I/O figures are for the compute host D: drive where the virtual desktops reside—disk activity on the C: drive is minimal.

 

Overall Analysis Results

In these series of tests we can see that changing the settings for McAfee scan on reads and writes can have an impact in vm density for virtualized XenApp vms on Hyper-V. While the IOPS per XenApp user where not too drastic of a difference as they ranged from between 4 to 5, the key thing to note is the impact on the Hyper-V logical processor. The average processor utilization in the Basic and Standard workload with McAfee reads and writes enabled ranged from 45% to 52%. With both these McAfee reads and writes enabled it also results in a decrease in the amount of XenApp users and XenApp vms on Hyper-V. The average processor utilization in the Basic and Standard workload with McAfee writes only enabled ranged from 41% to 42% with an increase in XenApp VM density. Proper planning of enabling this McAfee function should be planned carefully in order to provide the optimal user experience and Hyper-V host performance with XenApp workloads. In our next set of tests we’ll change out the Hyper-V version from 2008r2 SP1 to Windows 2012 so stay tuned for more great information.