How do Citrix’s HDX technologies solve the challenge of delivering softphones and Unified Communications applications from a virtual desktop or as hosted applications? Just as with multimedia playback, we use two complementary approaches.

Our “generic” HDX real-time technologies are designed to deliver any softphone or UC app without the need to modify or hook into it in any way. Our “optimized” HDX real-time architecture, in contrast, shifts the media processing workload from the hosted app itself to a media engine running on the user device, thereby maximizing server scalability, minimizing network bandwidth consumption and ensuring zero degradation of audio-video quality.

Let’s take a closer look at each of these approaches and how they work together.

Generic HDX RealTime

With our generic HDX real-time technologies, the audio output from the softphone or UC application is intercepted, compressed using the Optimized-for-Speech codec and sent over the ICA protocol to the user device. There, the Citrix Receiver decodes it and directs it to the user’s headset or speakers. Video output is rendered by the app on the server and detected by Adaptive Display, which recognizes video regions by the frequency with which the screen is updated. Adaptive Display dynamically adjusts to the network,
selecting the combination of compression level and frame rate that will deliver the best possible user experience over the available bandwidth, within policy. The Citrix Receiver decodes this traffic and composites it into the desktop.

In the other direction, the Citrix Receiver uses the Optimized-for-Speech codec to compress audio input from the user’s microphone. Webcam Video Compression is used if the user is participating in a video conference to reduce the required upstream bandwidth, typically down to 300-600 Kbps.

The HDX audio subsystem has been designed for broad app compatibility. It handles the peculiarities of softphones, which may open the audio device first for ringtone and then a second time to establish the voice path. Audio Plug-n-Play allows the user to connect or unplug their headset or other audio device at any time; it doesn’t have to be plugged in when the user logs in. And Audio Device Routing lets users direct ringtone to their speakers but audio to their headset, if they like to remove their headset between calls.

Jitter buffering smoothes out the audio when there are variations in network latency so that it doesn’t speed up and slow down due to packets arriving at inconsistent rate. While buffering adds a small amount of delay, this is more than offset by the reductions in latency that our engineers have achieved by carefully tuning the HDX audio stack. Further audio latency reduction is achieved on XenDesktop and VDI-in-a-Box by selecting UDP/RTP transport in the ICA protocol, since UDP has no requirement for packets to be acknowledged, which could introduce delay if the network is lossy (as might be the case on a wireless connection) or congested.

Quality-of-Server (QoS) routing on the network also minimizes latency in the audio path. The ICA protocol used by HDX supports multiple data streams. By default, audio traffic is assigned to the highest priority stream. Virtual channel priorities within a data stream also help. And there is support for packet tagging, both DSCP tagging for RTP packets (Layer 3 tagging) and also WMM tagging for WiFi.

Echo cancellation is important when using speakers and a microphone instead of a headset, such as when in a private office or in a meeting room. This technology was improved last year to be more tolerant of variations in the distance between the speakers and the microphone.

Our generic HDX real-time technologies are capable of delivering very good audio-video quality. Some customers have commented that they’re obtaining better sound quality with HDX than they had before with physical telephone sets. That may be due to the modern codec technologies we use.

When engineering a desktop virtualization system for real-time communications, it is important to take into account the impact that these applications can have on server scalability. With softphones and voice
chat features, it is critical to minimize latency in the audio path. If too much load is put on the server, unacceptable delays can be introduced that could interfere with normal conversation. As a rule of thumb, the number of simultaneous audio conversations that can be supported on a server is about half of the number of typical office application users. This is especially important to consider in a call center environment where all users may be using their softphone at the same time. Likewise, the CPU impact of a user engaged in a video chat session is several times greater than that of a user running only typical office apps.

Optimized HDX RealTime

Citrix’s two-pronged approach to delivering softphone and unified communications applications helps maximize server scalability and reduce the cost of desktop virtualization. The idea behind the optimized HDX real-time architecture is to offload the server for the most prevalent and demanding usage scenarios, leveraging the processing power of the user device whenever feasible. This is analogous to how we handle multimedia playback; in addition to a set of generic technologies for video and audio playback we provide optimized solutions for Adobe Flash and a multitude of Windows media formats.

If you think about the layers of software that make up a softphone or UC application, you can picture a User Interface layer at the top, some business logic in the middle, and down at the bottom a “media engine” that handles signaling, encoding and decoding. The trick to offloading the server for maximum scalability is to shift all of the media processing to the user device. This implies moving the media engine to the endpoint. The inter-process communication between the business logic layer and the media engine then needs to occur over the network on a virtual channel.

When you move the media engine to the user device, there’s no longer any need for the audio and video traffic to go through the Citrix server. Instead, this traffic flows peer-to-peer, directly between the two
parties to the conversation or from each user to a conferencing server in the case of a multi-party call. This results in significant additional benefits besides server scalability; network bandwidth consumption is greatly reduced, no latency is added and there is zero degradation in audio-video quality.

Implementing this architecture can be done in two ways. Ideally, the application vendor would modify their softphone so that the media engine can run separately from the other layers of software; Citrix facilitates this by providing APIs for signaling over ICA and for properly positioning locally-rendered video on the virtual desktop. Alternatively, the vendor’s application may offer mechanisms to allow a third party to modify how voice and video calls are initiated and controlled.

Cisco was the first major Unified Communications vendor to implement an optimized architecture for XenDesktop. Cisco’s VXI solution for Unified Communications Manager supports a variety of devices including the VXC 6215 Linux-based thin client, the VXC 2112 telephone “backpack” (zero client) and the VXC 4000 software appliance for Windows PCs. All of these offerings eliminate “hairpinning” and maximize XenDesktop server scalability by running the media engine on the endpoint.

Another very popular Unified Communications application used by Citrix customers today is Microsoft Lync. Citrix recently introduced an Optimization Pack for Microsoft Lync 2010 that is currently available with XenDesktop 5.6 Feature Pack 1 and XenApp 6.5 Feature Pack 1. The optimization pack consists of two components. The HDX Connector for Lync is installed alongside the hosted Lync client to redirect call initiation requests and other signaling over an ICA virtual channel to the second component, the HDX RealTime Media Engine, a plug-in to the Citrix Receiver. In addition to proprietary Lync codecs licensed from Microsoft (namely, RT Audio and RT Video), the HDX RealTime Media Engine includes a variety of industry-standard codecs such as H.264 and various flavors of H.263 to facilitate interoperability with other products (for example, in-room video conferencing systems). The Optimization Pack for Lync requires no changes to existing Lync infrastructure such as the Lync Front-End Server, Lync Audio/Video Conferencing Server and Lync Edge Server.

Avaya has long supported Citrix XenApp customers with Unified Communications and Contact Center solutions which allow the soft client to share control with a desk phone. Avaya’s new optimized VDI Communicator solution for XenDesktop and thin clients ensures uncompromised voice quality
with local media processing and allows large scale deployments while reducing capital expenditure and TCO.

In summary, Citrix’s goal is to partner with the leading UC vendors in the market to create the optimization packs that will have the greatest benefit to our desktop virtualization customers. We regularly check with our customers to identify which applications and devices are most popular since optimizing these will have the biggest positive impact on server scalability and user experience. We also continue to enhance our generic HDX real-time technologies so that users are never in a position where Citrix cannot deliver the app they need at the moment. The generic and optimized technologies are designed to work together; for example, a user might at one moment access Microsoft Lync from a device that supports the optimized HDX architecture, then get onto a Cisco WebEx or Adobe Connect video conference using our generic HDX real-time technologies.

The net result is that HDX lets people work from anywhere, on any device, using the voice and video chat features they need for effective communications.

Derek Thorslund
Director of Product Management, HDX