It is said that medieval scholars used to debate how many angels could dance on the head of a pin. These days, the more practical question I am frequently asked is, How many users can share a GPU?

With today’s Tech Preview release of XenApp 6.5 OpenGL GPU Sharing to XenDesktop/XenApp Enterprise/Platinum customers under Subscription Advantage, you can now come to your own answer to the users-per-GPU question. To download the Tech Preview, login to the Citrix Downloads page with your MyCitrix credentials, select “XenApp” and select “Betas and Tech Previews” and click on Find; the Tech Preview is posted at http://citrix.com/downloads/xenapp/betas-and-tech-previews/opengl-gpu-sharing-feature-add-on.html.

XenApp 6.0 introduced HDX 3D GPU Sharing for DirectX apps in March 2010. And as 2012 came to an end, we released a private Tech Preview of OpenGL GPU Sharing to selected partners and customers. While the most demanding “tier 1” graphics professionals (such as design engineers who are used to having a powerful graphics workstation at their desk) generally require a dedicated GPU for top performance, XenApp GPU Sharing is perfect for “second tier” users who may not be designing the next airplane or automobile but need to view or edit large 3D models that can’t be satisfactorily delivered using CPU-based software rasterization. Since this technology directly leverages the GPU video driver it doesn’t suffer from the performance limitations of GPU virtualization implementations based on API Intercept where the graphics commands have to be transferred from the user session to the session that controls the graphics processor. Nor is it limited to older versions of DirectX and OpenGL. And it can be used either on bare metal or with a hypervisor that supports GPU passthrough, such as XenServer 6.x or vSphere 5.1.

Comments from our early Tech Preview participants have been very positive and informative:

  • I tested ten OpenGL applications including some Tier 2 apps/games with OpenGL GPU Sharing on XenApp 6.5 and it works brilliantly.
  • On a system with two NVIDIA Q4000 cards we ran 18 users (ten in one XenApp VM and eight in another VM) using a test app that works with ESRI ArcGIS, and we still had space for more.
  • Our customer purchased a new dedicated Dell server with an NVIDIA Quadro 6000 and we installed XenApp 6.5 with the OpenGL GPU Sharing feature add-on to test Dassault SolidWorks. The customer said this is “AWESOME”! They said it makes a huge difference; it looks and responds just like at the console. In some cases it performs better than their dedicated desktop systems! They are really impressed.
  • Running Dassault SolidWorks, Ansys Workbench and Fluent, our tests indicate that customers will hit a CPU limitation before they hit a GPU processing limit or GPU RAM issues. We first tested using an NVIDIA Quadro 4000 card. We got good performance numbers per GPU as the models are not overly complex. Scalability was 6 to 10 users per Q4000. With the new NVIDIA GRID K2 card it seems that the CPU will be the limiting factor. Currently our test XenApp servers have eight cores and around 16-32GB of RAM depending on the application.
  • We tested on the NVIDIA Quadro 6000 card (448 CUDA cores) with four users all running our most complex animation. The animation runs for 166 seconds with one user. With four users it took just three seconds longer!  Considering our timing is manual stopwatch, the margin for error basically says no slowdown with four users. In fact, the Quadro 6000 was able to support 30 users running Dassault 3DVIA Composer Player with only minor slowdown. At 33 users a few users started to experience jerky motion, but the app was still usable. 40 sessions seems to be the limit.  The GPU was maxed, not the video memory.  My test case was users running the animation non-stop, whereas in real world usage the animation is like a training video and it has forced pause points, so real users would stop at times to read or do work. The point is that the test was harder on the graphics card than the real world is, yet we could run about 30 concurrent users on a Q6000.

Additional tests are now underway with NVIDIA’s new GRID K2 card which offers two high-end Kepler™ GPUs. This card is specifically designed for Cloud-based visualization and high performance computing. With 3072 total CUDA cores and 8 GB of video RAM, the K2 has the potential to deliver even higher user density.

Thomas Poppelgaard of Commaxx has kindly prepared this video to show the performance of XenApp GPU Sharing: http://www.youtube.com/watch?v=yH1vHAUSL98&feature=youtu.be

By the way, if you can attend NVIDIA’s GPU Technology Conference in San Jose, California, March 18-21, you’ll be able to hear Thomas speak on the topic Successfully Delivering 3D Graphics Solutions for Your Business. Thomas has implemented Citrix 3D graphics delivery systems for several major customers in Denmark and will share his experiences and practical tips on how to achieve outstanding performance. I will also be speaking at the conference, on the topic Delivering 3D Graphics from the Cloud with XenApp and XenDesktop VDI.

All the major hardware platform vendors (Cisco, Dell, HP, IBM and SuperMicro) are certifying platforms that support the new GRID cards. If you’re able to get your hands on one of these new servers, please share your test results on our XenApp GPU Sharing Technology Preview Support Forum.

Derek Thorslund
Director of Product Management, HDX