VMware reacts to the Virtual Reality Check benchmarks

Just yesterday virtualization.info covered the amazing work of Ruben Spruijt (Solution Architect and CTO at PQR) and Jeroen van de Kamp (Enterprise Architect and CTO at Login Consultants), a couple of well-known and respected virtualization experts that lead two separate Citrix and VMware solutions providers.

Their Virtual Reality Check project is a performance analysis of the leading hypervisors (VMware ESX, Citrix XenServer and Microsoft Hyper-V) when running typical Microsoft Terminal Services/Citrix XenApp workloads: a Windows XP virtual desktop loaded with Outlook 2007 and Acrobat Reader 8.

Easy to guess, the post achieved one of the highest page view score in the history of virtualization.info, despite other prominent influencers already covered the project the previous week.

The non-sponsored results published by Spruijt and van de Kamp generated a lot of reactions as their conclusion on Citrix XenApp is:

Not having the ability to overcommit virtual machine memory is an clear disadvantage when
virtualizing desktops. Such a feature allows much more VM’s to be run than physical memory
normally would allow, which makes a virtual desktop solution much more economical.

…

XenServer is clearly optimized for Terminal Server and XenApp workloads, achieving near bare metal performance and even higher user densities than bare-metal configurations. This is possible because 32-bit 2003 terminal server with 4GB memory is relatively very efficient in comparison to other Windows operating systems.

While Microsoft didn’t comment (it has no interest in doing so), VMware immediately reacted: the company’s performance team published a new benchmark just few days (Jan 30) after the project Virtual Reality Check was announced (Jan 26).

The VMware performance study compares XenServer 5.0 and ESX 3.5.0 Update 3 performance when running Citrix XenApp workloads and highlights some odd results compared to what Virtual Reality Check exposed:

ESX supports about 13% more users than XenServer at a given latency while using less CPU.

Why the benchmarks are so different?

Stats and polls can be read in several different ways and manipulated as needed.
Simon Crosby, the CTO of Virtualization and Management division at Citrix, provides a possible read:

…

the VMware “study” is not a thorough exploration of a valid set of parameters for the Terminal Services / XenApp workload. Instead, it is a narrow look at a particular set of configurations which are not reasonable in practice:

No test of 32 bit workloads – the primary candidates for server consolidation for this workload because a 32 bit OS exhausts its memory at 4 GB and a modern server can pack hundreds of GB and many cores. Our work in this area has shown a
compelling benefit to virtualizing TS/XenApp 32 bit workloads on XenServer, and an equally compelling set of reasons not to use ESX for this purpose.

Unrealistic configuration – The server used in the tests is certainly punchy – the machine had 64 GB RAM and 4 processors–each with 4 cores (16 total processor cores). Anyone familiar with 64b TS/XenApp knows this machine could easily support hundreds of XenApp sessions. But the “scientists” at VMware don’t. They instead chose to run exactly one VM (with only 2 vCPU’s and using only 25% of the available memory) and XenApp at minimal levels of concurrency (i.e. 10-40 users). No multi-VM scenarios, no tests at useful user-counts. Based on their measurements they appear to gleefully extrapolate deeper into the realm of fiction to proudly pronounce their horse the winner.

At this point we would like an additional comment from Ruben Spruijt and Jeroen van de Kamp as their work is somewhat questioned by the new VMware study.