When deploying a Provisioning Services environment, the question of where to host the vDisk store and the VHD files for target devices often arises. There are really only two primary options:

1) Block Level Storage

2) NAS Storage (CIFS or in some cases NFS)

I go into quite of a bit of detail as to why one should select block level storage and the benefits of Windows System Cache in reducing disk IOPS associated with Provisioning Server in a white paper I wrote entitled Advanced Memory and Storage Considerations for Provisioning Services. You can check out the White paper here http://support.citrix.com/article/ctx125126. I encourage everyone to first read the white paper as it will help in understanding the rest of this post.

In the white paper I discuss how Windows System Cache works and that CIFS should never be used as a Provisioning Services Store as the Provisioning Server does not cache vDisk contents in System Cache memory when using a CIFS share. I have always found it odd that the Provisioning Server does not cache data from CIFS shares because caching of files from CIFS share is supported by the Windows OS. I decided to do some more digging into this issue and came across some interesting and very promising results! But first, a little background.

Network Load with CIFS Stores

When a target device boots from a Provisioning Server, the Provisioning Server must read the data from the VHD file and transmit it over the network to the target device. Assuming that 200 MB of data is read when a target device boots and starts all of its services, this is both a 200 MB read operation by the Provisioning Server and a 200 MB transmit operation by the Provisioning Server. If the Store is on a CIFS server then this is 400 MB of total network traffic. This is illustrated in the diagram below:

In an ideal situation the Provisioning Server would only read this 200 MB of data from the CIFS server one time and store it in System Cache memory. Thus, for subsequent target device boots, only 200 MB of data would be transmitted over the network from the Provisioning Server to the target device. There would be no data transferred over the network from the CIFS server to the Provisioning Server as all of the data is already in the Provisioning Server System Cache memory. Unfortunately, this is not the case with Provisioning Server. When using Provisioning Server with a CIFS store, you double all the network traffic, which essentially renders CIFS useless as a Provisioning Server Store due to the horrible performance.

It is for this reason that we always recommend that you give each Provisioning Server its own local block level storage device to use for the Store. The downside to this approach is that you must keep the VHD files in sync across multiple Provisioning Servers. If you have 6 Provisioning Servers, then you must ensure that the files are copied to all 6 Provisioning Servers. There is the option of using a read only LUN or a shared LUN with a cluster file system such as Melio so that all Provisioning Servers can share the same block level storage device. Unfortunately, implementing a third party cluster file systems adds more cost and complexity than it is really worth. The read only LUN feature of Provisioning Server is a pretty useless option as well due to the fact that you need multiple LUNs so that you can make updates, not to mention the “pain in the rear end” process this introduces in trying to make updates. In reality, most of our customers find it is just easier to manually copy the files across all the Provisioning Server local stores or write a robocopy script.

This is why using CIFS seems like such an appealing option and so many people attempt to use it. You have a highly available clustered file server that simply shares the vDisks for all of the Provisioning Servers. This reduces the drive space needed on each Provisioning Server and allows for one shared repository, which is much easier to update and maintain. Unfortunately, it destroys your network and performs terribly.

I was not happy with the fact that Provisioning Server does not cache the VHD file in System cache RAM when reading it from a CIFS share, since this is something that should work, so I decided to dig into this issue a little deeper in order to figure out what was going on.

SMB and File Caching

When an SMB 1.0 client (NT 4.0/2000/2003/XP) opens a file from a network share it uses a mechanism called Opportunistic Locking (oplock). I will give a quick explanation of Oplocks below, but for more detailed information refer to the following links:

http://msdn.microsoft.com/en-us/library/aa365433(VS.85).aspx

http://msdn.microsoft.com/en-us/library/cc308442.aspx

There are four primary types of oplocks, two of which are most commonly used:

Level 1 – This is an exclusive lock placed on a file that allows the client opening the file to not only cache read operations, but also cache write operations. Since write operations can be cached on the client, this type of oplock prevents any other user or process from accessing the file. It locks out other systems from being able to open the file.

Level 2 – This is a shared read only oplock. If a client or process is granted a Level 2 oplock, it can cache read operations. Since only read data is being cached, multiple clients or processes can be granted a Level 2 oplock to the same file.

Based on how Level 2 oplocks work, multiple Provisioning Servers should be able to have a level 2 oplock to the same VHD file on a CIFS share and cache the data that they read from the file. However, for some reason, the Provisioning Servers will not cache the data.

Oplocks function pretty much the same way on SMB 1.0 and SMB 2.0. However, they have been significantly improved with SMB 2.1. SMB 2.1 is the version of SMB that ships with Windows 7 and Windows Server 2008 R2. With SMB 2.1 oplocks have been replaced with a new and more powerful mechanism called leasing. There is a great blog on the Microsoft site that describes oplocks and leasing. I encourage everyone to read it.

http://blogs.msdn.com/b/openspecification/archive/2009/05/22/client-caching-features-oplock-vs-lease.aspx

Provisioning Server and Oplocks

Why does Provisioning Server not open files from a CIFS store using a read caching level 2 oplock? I setup a lab environment to try and figure this out. The first thing I looked at was the registry keys that control SMB and oplocks. To my dismay, I noticed the following keys were configured:

HKLM\SYSTEM\CurrentControlSet\services\LanmanWorkstation\Parameters

“EnableOplocks” = dword:00000000

HKLM\SYSTEM\CurrentControlSet\services\mrxsmb\Parameters

“OplocksDisabled” = dword:00000001

HKLM\SYSTEM\CurrentControlSet\services\LanmanWorkstation\Parameters

“Smb2″ = dword:0×00000000

Once I saw these keys, it all started to make sense as Oplocks had been completely disabled and SMB 2 had been turned off. I know that I did not set any of these keys and the only software installed was Provisioning Server 5.6. So I began to wonder why Provisioning Server had turned off oplocks. At that point I figured I needed to do a little RTFM, so I opened up the Provisioning Server Admin Guide and found the following which is a screen shot from page 208 of the Admin guide:

So there you have it, the Provisioning Server installation is what has been turning off oplocks and giving us crappy performance when using CIFS shares. However, the install actually turns off oplocks under LanManWorkstation as well and also disables SMB 2. I could not find references as to why SMB2 is turned off.

Oplocks are turned off so that Provisioning Server failover times can be reduced. However, what the Admin guide fails to detail is that this expedited failover only apples to the Write Cache files that are stored on the Provisioning Server or vDisks in private mode. Both of these items are things that we highly recommend against doing to a production environment. Think of how we use Provisioning Server most often, it is for shared standard mode read only desktops and XenApp servers. For both of these deployments, we always recommend that the write cache be configured as “Cache on Device’s Hard Disk”. Placing the write cache on the Provisioning Server can lead to really poor performance. And if you need to deploy persistent private mode virtual desktops, then you should just let the Hypervisor provide the disk to the VM and not use Provisioning Server. The only time you would have a vDisk in Private mode is when you are making changes to a new master image. HA really isn’t that much of a concern in that scenario as there are other ways to deal effectively with updates.

So, if you are using Provisioning Server to serve read only vDisks and you are placing the write cache on the target device, you can safely enable oplocks and use a CIFS store while still getting all the benefits of the System Cache.

Optimizing Provisioning Server for use with CIFS

I decided to test a CIFS Store in my lab with Provisioning Services optimized for SMB and oplocks. I ran tests using the following two scenarios:

#1 – Pure Windows Server 2008 R2 and SMB 2.1

1 Windows 2008 R2 file server hosting a CIFS share as the Store

2 Windows 2008 R2 Provisioning Servers configured together in HA

– A Windows XP Standard mode vDisk in the CIFS Store

– 6 virtual machine target devices configured to boot from the shared the shared vDisk with write cache on a small HD attached to the virtual machine and provided by the Hypervisor.

#2 – Mixed Windows Server 2008 R2 and Windows 2003 with SMB 1.0

1 Windows 2003 32-bit file server hosting a CIFS share as the Store

1 Windows 2008 R2 Provisioning Server and 1 Windows 2003 x64 Provisioning Server configured together in HA

– A Windows XP Standard mode vDisk in the CIFS Store

– 6 virtual machine target devices configured to boot from the shared the shared vDisk with write cache on a small HD attached to the virtual machine and provided by the Hypervisor.

All servers were configured with 1.5 GB RAM and the target devices with 512 MB RAM (I have limited lab resources).

I optimized the servers as follows:

Windows 2003 File server:

HKLM\SYSTEM\CurrentControlSet\services\LanmanServer\Parameters

“autodisconnect” = dword:0000ffff

Windows 2008 R2 File Server

HKLM\SYSTEM\CurrentControlSet\services\LanmanServer\Parameters

“autodisconnect” = dword:0000ffff

‘Smb2″ = dword:00000001

Windows 2003 x64 Provisioning Server

HKLM\SYSTEM\CurrentControlSet\services\LanmanWorkstation\Parameters

“EnableOplocks” = dword:0×00000001

HKLM\SYSTEM\CurrentControlSet\services\mrxsmb\Parameters

“OplocksDisabled” = dword:0×00000000

“CscEnabled” = dword:0×00000001

HKLM\SYSTEM\CurrentControlSet\services\LanmanServer\Parameters

“autodisconnect” = dword:0x0000ffff

Windows 2008 R2 Provisioning Server

HKLM\SYSTEM\CurrentControlSet\services\LanmanWorkstation\Parameters

“EnableOplocks” = dword:0×00000001

HKLM\SYSTEM\CurrentControlSet\services\mrxsmb\Parameters

“OplocksDisabled” = dword:0×00000000

“CscEnabled” = dword:0×00000001

HKLM\SYSTEM\CurrentControlSet\services\LanmanServer\Parameters

“autodisconnect” = dword:0x0000ffff

“Smb2″ = dword:0×00000001

For both tests I used Perfmon on the file servers to capture the “Bytes Sent/sec” on the network interface. This allowed me to see how much data was being downloaded from the CIFS share on the file server as virtual machines target devices were booted from Provisioning Server. As a baseline I had previously observed that my Windows XP image reads a little over 200 MB of data during the first 2 minutes after it boots. This is the total amount of files that get read in order to fully load the GINA and start all of the services in the image. For each test, I booted my six target devices as follows:

  • Boot targets 1 – 3 in 10 second intervals
  • Wait 3 minutes
  • Boot targets 4 – 6 in 10 second intervals
  • Wait 1 minute
  • Reboot targets 1 -3 in 10 second intervals
  • Wait 3 minutes
  • Reboot targets 4 – 6 in 10 second intervals
  • Wait 3 minutes
  • Shutdown targets 1 -3 in 10 second intervals
  • Wait 60 minutes
  • Boot targets 1 -3 in 10 second intervals

In all, there were 18 target boots with at least three machines up at all times. HA worked flawlessly and the target devices were always evenly distributed across both Provisioning Servers. In both tests, the file server transmitted just under 500 MB of data during the first 3 minutes of the test. For the entire remainder of the test, 15 target devices were booted up and down and the file server transmitted less than 30 MB of total data over the course of the final hour. The Provisioning Servers were effectively caching the vDisk contents in System Cache RAM with little load on the file server.

CIFS and Provisioning Server – A Winning Combination!

Being able to effectively use CIFS will be of great value to many deployments. You can now have a single shared Store that provides excellent performance without all the management headaches of a clustered file system or the clunky read only LUNs feature of Provisioning Server! And the great news is that this even works with Windows 2003 and SMB 1.0! However, I would still recommend going with SMB 2.1 and Windows Server 2008 R2.

There are a few things to keep in mind if you are going to make this work.

1) Use Provisioning Server 5.6 SP1. SP1 for 5.6 is critical to making this work. I noticed that prior to SP1, if all the target devices are idle for 15 minutes; Provisioning Server will close the file handle to the CIFS server and end up flushing the vDisk data out of System Cache. SP1 appears to fix this as the Provisioning Server will always keep the file handle open as long as one target device is booted.

2) Enable all of the Registry Settings I detailed above. Oplocks must be turned on and the autodisconnect value should be turned up above its default value of 15 minutes. You do not want the file server closing the Provisioning Server handle after 15 minutes of idle time, which will cause the System Cache to flush.

3) Dedicate a file server cluster to hosting the vDisk store and give it lots of RAM. Do not use the file server for any other purposes. The file server must read the vDisk from its hard disk. As long as the file server has enough RAM, it will read the vDisk once and store the contents in System Cache as well. In order to calculate the proper amount of RAM for the file server, use the same formula that I defined for Provisioning Servers in the white paper I referenced at the beginning of this post.

http://support.citrix.com/article/ctx125126

4) Give your Provisioning Servers lots of RAM so that they can cache the vDisk contents in System Cache. Refer to the formula in the white paper.

5) Test, Test, Test!!! As always, make sure you properly test your configuration and track the disk and network counters in Perfmon to make sure that caching is working properly before you roll into production.

I hope you found this article valuable and enjoy testing Provisioning Server with CIFS!!!

-Dan Allen