I’ll begin by saying that this solution is certainly not for everyone.  It adds complexity.  It creates additional management overhead.  It requires NetScaler integration and mildly advanced networking skills.  We have never publicly documented the solution before this week.  But it is absolutely the only bulletproof solution for making our License Server (LS) truly highly available in an active/active fashion without any downtime.

And while we finally got around to documenting the solution this past week, it’s not like this is a “new” solution we just invented.  Our Consulting team has quietly kept this solution in our bag of tricks and actually been implementing this (or subtle variants) for our largest customers with the most demanding uptime/resiliency requirements for several years now.  But this past week while working with Entisys on a XA design for one of our largest customers in the world (~5k XA servers – not a typo), we made the solution even better by lab’ing it all up, doing some Wireshark traces to find every port being used and testing some advanced failover scenarios to simulate black holes and acquisition errors (which can cause our License Server to NOT go into grace period, and ultimately create a DoS and take down an entire Citrix environment in minutes).

The Bottom Line

I am not going to go into a ton of detail here (since Dane has already done a fantastic job documenting the solution and all the details here), but a few quick comments on how the solution works at a high-level:

  • We stand up 2 Citrix License Servers with identical hostnames on separate networks (non-domain joined)
  • We utilize NetScaler and PoSh to intelligently monitor all LS services and ports on each LS
  • We direct traffic to only ONE LS at any given time (this ensures EULA compliance)
  • Only when the primary LS is “down” (black-holed or otherwise), we failover and start routing traffic to the secondary LS (which is already up and has all licenses installed)
  • Please note this solution only works and has been tested with XA/XD (no XM support at this time)

So it’s actually fairly straight-forward at the end of the day (we are load balancing two boxes!), but it definitely requires some advanced configuration.  Dane’s example even shows how we can do this with GSLB to protect against site-wide outages or data center failures.  And this works with our latest License Server and IMA or FMA (as shown in Dane’s example).

Special thanks to Dane and Vic at Entisys, and Brendan, JayC and GeorgeT from CCS (and all of our customers who have been “bit” by dreaded 500 errors or LS black holes over the years – this is for you!). 😉

-Nick

Nicholas Rintalan, Lead Architect & Director, Americas Field Services, Citrix Consulting