Clearing The Air (Part 2) – Thick or Thin Provisioned Write Cache

Background & History

I have been asked about this quite a few times over the years, but since I got asked about it twice this past week alone…you know what time it is? Blog time! 😉 I selfishly need another reference I can point to, but I know this is a controversial topic and a lot of other folks will likely benefit from this info as well. So here goes…

This is sort of “Part 2” to my original article where I straightened out some conflicting information we had out there regarding creating PVS vDisks in fixed or dynamic format. If you haven’t read “Part 1” or you’re still creating vDisks using fixed format, you need to stop everything you’re doing and immediately read my original article on this subject!

Thick or Thin?

Assuming you’ve chosen the “Cache on device hard drive” write cache option (which 90% of our deployments use today because it’s the best trade-off between cost and performance…and by the way – this leading practice changed about a month ago with the introduction of a new VHDX-based write cache option called “Cache in device RAM with overflow on hard disk“, which my colleague is writing up an article about as we speak, so more on that in a week or so – UPDATE: It’s published!)…but what about the storage volume that hosts the write cache file – can that volume be thin provisioned? Does Citrix support it? But more importantly, should it be thick provisioned as a best practice? The short answers to those 3 questions are “Yes”, “Yes” and “Not Necessarily”. Now the longer explanation and answers…

The reason there is some conflicting information out there on this topic (i.e. “Citrix recommends fixed for the wC” and “I heard Citrix say you can thin provision the wC!”) is due to 2 reasons in my opinion:

1. The write cache is not an ideal candidate for thin provisioning.

2. Different customers have different priorities (cost vs. performance, etc.).

The first one is an important one worth discussing and why we’ve historically said that it’s “best” (and sort of safest) to simply thick provision the storage volume that hosts the wC file. Why? Because look at the I/O profile – it’s the worst case scenario for something like thin provisioning – small 4k blocks written very frequently and almost all of it is random. If we were infrequently reading (or writing for that matter) in a sequential manner in large blocks, we wouldn’t even be having this discussion – I’d tell you to thin provision it all day long and be done with it…you’d save a ton of money and never really have to worry about performance degradation, no matter what type of storage backed the wC. But since we are dealing with the worst case scenario from an I/O profile perspective, you need to exercise some caution before simply defaulting to thin provisioning.

The second reason is also valid and why I’ve told some customers in the past to flat-out use thin provisioning for the wC. Thick provisioning the storage volumes that host the wC to sort of “guarantee” performance is one thing – but you have to realize this can cost organizations doing large scale XA or XD deployments A LOT OF CASH. Just take the scenario where we might attach a 5 GB drive to each XD-based VM…or where we might attach a 15 GB drive to each XA-based VM. If you have 10k desktops or 20k users, the storage capacity costs associated with thick provisioning each of those volumes can be pretty expensive, especially if you’re ultimately storing these on your enterprise array. So cost is absolutely a factor that has to be considered as well, even if that means taking a small performance hit by thin provisioning those volumes that host the wC (and other data or files like event logs, the pagefile, etc.), which you shouldn’t forget about either!). Because you can save a ton of money by leveraging thin provisioning.

A “New” Best Practice?

So the net-net is PVS absolutely works with thin provisioning. And Citrix supports it for the wC as well. And we have a lot of large enterprise customers that have been doing this for years, whether we like it or not! So I think it’s almost a no-brainer these days to leverage thin provisioning if you’re ultimately storing these volumes on an enterprise array that has ample cache and likely hundreds of spindles. Because almost every major storage vendor has proven 10 times over that thin provisioning (and dedupe) has either zero or minimal performance degradation associated with it, as long as the controllers have ample processing power. So be confident in that scenario.

The scenario where we still might recommend thick provisioning is if you’re ultimately storing the wC on local storage. Now, these days, “local storage” can mean a lot of things, especially with SSDs, sophisticated RAID algorithms and technologies like VSAN…so I’m not going to attempt to tackle all the scenarios and where/when you shouldn’t leverage thin provisioning. Because I know some customers that are in fact using thin provisioning with “local storage” and it works perfectly fine, even from a performance perspective. But if you’re using “old school” local storage and have just say 2 spindles at your disposal, these are the scenarios where you might see a significant performance penalty associated from thin provisioning the volume hosting the wC and other files. So in those scenarios, either go thick to be safe or TEST.

What about VHDX or Disk Alignment Issues?

You probably could tell from my blurb above that I’m a little bit excited about this new wC option (RAM w/ overflow to disk) we recently unveiled – I seriously think it will save lives and change the way we architect Citrix environments going forward. Stay tuned for a full write up by my colleague and why it’s so awesome (UPDATE: It’s published!)…but I do want to touch on a couple important things related to the new wC option in this article. The first being that this new wC option is VHDX-based, which is different than the “legacy Ardence” or PVS format we use today for all the the wC options. So what does that mean? Well, for starters, this new VHDX format will likely mean larger wC files than the older format. The jury is still out in terms of how much larger the wC files will be over time in the steady state, but we definitely know they are a lot bigger initially. But we are growing the file in 2 MB chunks and we support TRIM on the newer operating systems, so unlike the old wC format, these files may actually shrink over time. And by the way – MSFT (who owns the VHDX spec) chose 1 and 2 MB block sizes on purpose, to play nice with almost every storage vendor out there. So do we support thin provisioning with this new wC option/format and do we foresee any disk alignment issues with VHDX? Yes and No! And I confirmed this with our Engineering team already.

Wrap-up (& One Final Recommendation)

I’ve purposely over-simplified “thin provisioning” in this article. I realize you can thin provision at the VM/hypervisor side and also on the array side (“thick on thin” vs. “thin on thin”, etc.). And I realize there are even slight variations of “thin” on the VM/hypervisor side that have to due with how space is allocated vs. when blocks are written, especially in ESXi (truly “thin” vs. “lazy zero”). But I wanted to keep this article short, sweet and simple. If these concepts are new to you or you want a couple good references on thin provisioning in general, check out these articles I’ve pointed people to in the past (this, this and this).

Lastly, if you’re going to take advantage of thin provisioning, I have one important recommendation or “ask” – please monitor. You have to remember that thin provisioning is a form of over-commitment…there is only so much physical space, and when we run out of physical space, we encounter nasty “out of space” conditions, and bad things happen. Very bad things. Trust me – you want to be on PTO when you run into one of these conditions. 😉 So please just monitor your allocation and disk usage and set up some proactive alerts (both at the hypervisor and array layers if you’re going thin on thin) to avoid these ugly situations.

Hope that helps clear the air. Remember, there are best practices and then there’s reality. It’s OK to go against the norm if you know what you’re doing. 😉

-Nick

Nick Rintalan, Lead Architect, Americas Consulting, Citrix Consulting Services (“CCS”)

Topics

Products