Desktop Virtualization Best Practices with VMware HA Admission Control & Atlantis ILIO

Posted by Jim Moyle on March 5th, 2013

Cross-posted from the Atlantis Computing blog

http://blog.atlantiscomputing.com/2013/02/desktop-virtualization-best-practices-with-vmware-ha-admission-control-atlantis-ilio/

In recent months, customers have asked me a lot of questions about how to configure resilience for virtualized desktops. If you want to just jump straight to the conclusion, click here.  For the full explanation then please carry on. The first question that needs to be asked is whether the workload is stateless or persistent.  A stateless desktop workload is created ‘on the fly’ for the user by combining the OS, applications and persona at login.  A persistent workload is usually when the user has elevated privileges, can perhaps install applications themselves and has a ‘one to one’ relationship with their desktop.

Stateless desktops are created and destroyed as the user needs them. This means that the accidental destruction via host failure shouldn’t matter too much as the user can log back on to any other desktop and continue working. In fact, to the user, it should seem that any desktop they log onto looks exactly like ‘their’ desktop.  This means that, in effect, the resilience for a stateless desktop is taken care of by the broker.  This has large advantages in terms of resources for the desktop virtualization implementation as we no longer need to spend any money protecting any of the desktop VMs by enabling resilience technologies either in the hypervisor or on the physical host.

A persistent desktop is a totally different use case and needs to be treated very differently.  We now need to be able to minimize downtime on that particular VM as the user may have installed their own apps, made configuration changes to the OS and applications which are not backed up centrally, they may even have stored files on a local drive.  In this case we need to enable resilience technologies for that user. I’m going to concentrate on VMware High Availability as the resilience technology to be used as it is the one I run into most commonly.

VMware High Availability (HA) is defined as:

For this blog post, I’ll ignore VM and application monitoring and concentrate on the standard HA functionality.   Essentially, VMware HA will use shared storage to restart a VM on another host in the event of a failure.

HA is extremely easily configured and it can be enabled on a cluster in just five clicks.  It is the details that I’ve found people to have trouble with.  This is especially true for deployments involving Atlantis ILIO, as the presence of a VM with reservations changes the way we think about HA.

The example I’m going to use is an 8 host cluster with 256GB of RAM on each host with guest Windows 7 VMs.  Each Windows 7 virtual desktop has 2GB RAM, with the Atlantis ILIO VM allocated a reserved 60GB of RAM (Please refer to the relevant Atlantis ILIO sizing guidelines for your environment).  In a stateless VDI deployment, we could probably push the number of users per host up to about 140, but as we are looking at persistent desktops we’ll keep the number down well below that.

The figure below indicates to scale how our host RAM is assigned, with 2GB given to each Desktop and 60GB given to Atlantis ILIO.

 

VMware HA introduces the concept of admission control which states that:

“vCenter Server uses admission control to ensure that sufficient resources are available in a cluster to provide failover protection and to ensure that virtual machine resource reservations are respected.”

HA admission control can be configured in one of five ways and we will go through each of the configuration options and examine whether it should be applied to our current use case.

HA with admission control disabled

It would seem that disabling admission control when you turn on HA would be contradictory and you would be right.  Essentially you are saying ‘I want my VMs to restart, but I’m not willing to supply the capacity needed’.  Having said that, it’s still the most common configuration.  To quote Duncan Epping and Frank Denneman:

“Admission Control is more than likely the most misunderstood concept vSphere holds today and because of this it is often disabled.”

I don’t advise disabling admission control in any environment where HA is required as it almost guarantees that you will have increased downtime or decreased performance for your users in the event of a host failure.

Host Failures Cluster Tolerates

The Admission Control Policy that has been around the longest is the “Host Failures Cluster Tolerates” policy. It is also historically the least understood Admission Control Policy due to its complex admission control mechanism.  With the Host Failures Cluster Tolerates policy, vSphere HA performs admission control in the following way:

  • Calculates the slot size.  A slot is a logical representation of memory and CPU resources. By default, it is sized to satisfy the requirements for any powered-on virtual machine  in the cluster.
  • Determines how many slots each host in the cluster can hold.
  • Determines the Current Failover Capacity of the cluster.  This is the number of hosts that can fail and still leave enough slots to satisfy all of the powered-on virtual machines.
  • Determines whether the Current Failover Capacity is less than the Configured Failover Capacity (provided by the user).

If it is, admission control disallows the operation.

Slot size is calculated in the following way:

‘The slot size for CPU is determined by the highest reservation or 256MHz (vSphere 4.x and prior) / 32MHz (vSphere 5) if no reservation is set anywhere. HA will use the highest Memory Overhead in your cluster as the slot size for memory.’

When ILIO is involved in slot size calculation it means that the slot size becomes larger due to the memory and CPU reservations.

In this case, it means that each host will only contain four slots due to the Atlantis ILIO memory constraints.  Such large slot sizes can lead to reduced consolidation ratios and shouldn’t be used in this case. Host Failures Cluster Tolerates with slot size configured It is possible to configure the slot size using the two advanced settings:

das.slotMemInMB

das.slotCpuInMHz

By using these two advanced settings, it’s possible to change the slot size for HA and get better consolidation ratios. It is also possible to change the slot size in the new web administration console.  This is what it would look like if you changed the slot size to 2GB for memory.

As you can see the Atlantis ILIO VM consumes 30 slots due to its 60GB RAM reservation out of a total of 128 slots per host. In our eight host cluster, if we set the host failures the cluster tolerates setting to one and reduce the slot size to 2GB RAM, we will have 18 or 19 slots reserved per host as below:

This means that there will not be enough free slots (30) on each host to start the Atlantis ILIO VM.  VMware vSphere will try and use DRS (Distributed Resource Scheduler) to move the smaller desktop VMs out of the way to create enough free slots to enable to ILIO VM to be powered on.  This is called defragmentation.  Defragmentation is not guaranteed to work, as it may have to use multi hop DRS moves or multiple rounds of DRS and it still needs to respect affinity, anti-affinity and reservation rules. Defragmentation can greatly increase downtime and reduce the likelihood of a successful power on of the Atlantis ILIO VM. Both the defragmentation of resources and the fact that HA will start the VMs on any available server means that the Atlantis ILIO and the virtual desktops associated with it could be on different hosts.  Although this is a supported configuration, it is less than ideal from a performance and resource utilization perspective.

Percentage of Cluster Resources Reserved

This Admission Control policy is the most common in actual use.  The main advantage of the percentage based Admission Control Policy is that it avoids the commonly experienced slot size issue where values are skewed due to a difference in reservations between VMs on the host.

This will add up the total amount of CPU and Memory resources over the cluster and reserve a certain amount for HA purposes. If we configure both memory and CPU to be reserved at 13% for our 8 host cluster, it will look as below:

This will give us enough spare capacity across the whole cluster to account for one failed host (at 12.5% of the cluster resources).  In this case, the ILIO VM requires 24% of a single host capacity resulting in a defragmented state. As stated above, defragmentation can increase downtime and reduce the likelihood of a successful power on of the ILIO VM. Both the defragmentation of resources and the fact that HA will start the VMs on any available server means that the Atlantis ILIO and the Desktops associated with it could be on different hosts.  Although this is a supported configuration, and we have many customers successfully using this design, it is less than ideal.

Specify Failover Hosts

With the “Specify Failover Hosts” Admission Control Policy, HA will attempt to restart all virtual machines on the designated fail-over hosts. The designated fail-over hosts are essentially “hot standby” hosts. In other words, DRS will not migrate virtual machines to these hosts when resources are scarce or the cluster is imbalanced.

This option both guarantees that you will be able to restart your VMs and keeps the desktop VMs on the same host as their associated ILIO. The reason that many people do not pick this policy is that the dedicated failover hosts are not utilized during normal operations.  While this is a fair point, any reservation, be it slots, percentage or host, is going to prevent that proportion of your infrastructure from being used, the fact it’s in one place rather than spread out over the cluster isn’t a problem in my opinion.

Conclusion

As a virtual machine, Atlantis ILIO can take advantage of the availability and resiliency functionality of the underlying virtualization platform to ensure the reliability of virtual desktop infrastructure, while at the same time lowering cost and increasing performance. In the case of VMware vSphere, the best approach to VMware HA with Atlantis ILIO is to use Admission Control policies with a specified fail-over host.  Using this approach, the Atlantis ILIO virtual machine will remain on the same physical server as the virtual desktops using its data store, ensuring the fastest possible recovery from a host failure with optimal performance.

Resources:

VMware KB 1031703

VMware vSphere 5.1 Clustering Deepdive

Yellow Bricks HA deepdive

vSphere High Availability Deployment Best Practices

Yellow Bricks – Percentage Based Admission Control gives lower VM restart guarantee?

Windows 7 IOPS for VDI: Deep Dive

Posted by Jim Moyle on May 20th, 2011

I had the pleasure recently of presenting at both BriForum and the E2E conference.  Both these events are excellent resources for anyone wishing to know more about Desktop Virtualisation and I am always proud to be surrounded by other speakers who I regard as the absolute best in the business.

My topic of choice was the same as the title for this post Windows 7 IOPS for VDI: Deep Dive.  As I hate having lots of text on slides when presenting I created an accompanying document which I have now made available for download here:

Windows 7 IOPS for VDI: Deep Dive (Short form)

Please comment if you have any requests for more testing or a particular update to the document.

If for any reason the above link doesn’t work here is a mirror.

Amazon and Wikileaks: Can we trust the cloud?

Posted by Jim Moyle on December 6th, 2010

wall10

So the recent furore around Wikileaks has got me thinking about the cloud in a slightly different fashion.  I have always said that one of the big issues with the cloud has to be that you are no longer a big fish where it concerns the infrastructure that your data or applications reside upon.

If you own and run your own infrastructure then you are the biggest fish around when it comes to safeguarding the integrity of your applications and data.  It only takes the CEO to say ‘Jump’ once and everybody in the IT department starts asking ‘how high?’.

If you are a tenant on a shared service in the more traditional sense you may still be the biggest customer and you still may find yourself in the driving seat.

If we start to look at the biggest providers of all, namely Amazon and RackSpace, you are no longer a big fish in fact you will in all probability be a minnow.  Amazon have kicked wikileaks off its servers in response to political pressure, using violation of their Terms of Service as an excuse.

This is the relevant section from their ToS:

11.2. Applications and Content. You represent and warrant: [...] (iii) that Your Content (a) does not violate, misappropriates or infringes any rights of us or any third party, (b) does not constitutes defamation, invasion of privacy or publicity, or otherwise violates any rights of any third party, or (c) is not designed for use in any illegal activity or to promote illegal activities, including, without limitation, use in a manner that might be libelous or defamatory or otherwise malicious, illegal or harmful to any person or entity, or discriminatory based on race, sex, religion, nationality, disability, sexual orientation, or age;

This seems to be needlessly vague and could, in fact, be made to apply to any client.  So what is being made clear is that if you use the cloud, you can be kicked off the service on a corporate whim.  The fact that you are now a minnow means that there is no longer any pressure on the hosting organisation to care about you at all.

This does not just apply to highly politically controversial sites, it means if porn or nudity rules are tightened, childbirth or anti-rape sites, or the Scunthorpe tourist board could be taken down.  It also doesn’t just apply to accidental inclusions, any change of the ToS could mean that you no longer qualify to use the service.

What does this mean?  Well I’d say the old adage of if your data doesn’t live in three places it doesn’t exist might well apply here.  ie use more than one cloud provider and duplicate your data.

The trouble is, if you use the above theory, it completely negates one of the big cloud advantages: Their backup, uptime and data retention policies to ensure the safety of your data are world class so you don’t have to bother.

Whatever the politics around Wikileaks, the willingness of the biggest provider to so publically drop a client, with no recourse, has to make everyone think again before moving into the cloud.

Replacement troubleshooting tools for Citrix Program Neighborhood giving direct ICA/HDX connection to a XenApp Server.

Posted by Jim Moyle on July 16th, 2010

xConnectSS1

I was just with a client and needed a tool to launch a direct ICA connection to a server.  I knew such a Citrix tool existed, but for the life of me I couldn’t remember the name of it.  My Google skills also somewhat deserted me at the time and it was only later that I found the correct CTX article.

I was also under the impression that Nick Holmquist had stopped development on his xConnect tool which did a similar job.  Nick kindly pointed me to the correct Citrix Tool which is Citrix Quick Launch and told me that in fact he has not stopped developing his xConnect tool (think of it as mstsc for ICA).  You can find the beta download for his tool at tekhelix.

Citrix XenClient, first thoughts

Posted by Jim Moyle on May 18th, 2010

So I’ve had a chance to play with the new Citrix XenClient Express RC for a couple of days now, I was lucky enough to have a laptop on the HCL (a Lenovo T400) with enough RAM to cope with multiple VM’s.  When testing I’ve tried to keep in mind that this is a a 1.0 release candidate and not as yet ready for production.  Ian Pratt has famously said that if he knew how hard this was going to be he wouldn’t have done it, so was all that effort worth it?

Installation

Installing the XenClient host software was very easy and went exactly as it was supposed to, although there is no option to slip the hypervisor under the OS, or backup and retrieve the OS, so as yet, no in place upgrades are possible.  Whether it is possible to use XenConvert or Sysinternals disk2vhd to create a VHD, load it into the Synchonizer server and redeploy I don’t know, it’s certainly not a documented feature (edit: See comments).

There is two parts to the host installation software, though both install as one, the Xen hypervisor itself and what Citrix call the Citrix Receiver, now this is what they usually call their client software, so presumably it will act much more like it’s namesake later.

Installing the Synchonizer server was equally as simple, just import the xva into a XenServer and spend a couple of minutes configuring it as per the documentation.  I’m not surprised that there isn’t an option to use Hyper-V or ESX at the moment, although Citrix have shown that they are willing to port their virtual appliances to other hypervisors, so I wouldn’t be surprised to see the virtual appliances arrive on competing hypervisor products at some point.  At this point I would recommend a little bit of further configuration, which I will cover below in the XenClient with Synchronizer section.

Guest Installation

Creating the VM’s is again very simple, pick your OS, RAM and vCPU’s and that’s about it.  One issue with creating the VM is that the wireless card can only be used in shared mode, while the wired card can be used in the more traditional shared, bridged or host configuration.  This means that you can only use NAT with the wireless card, this could of course cause issues.  For instance you can’t configure an extra IP address on the card.  I get the feeling that getting the wireless networking to work was one of the harder things that Citrix had to do.

So far, supported Guests are limited to Windows 32 bit client operating systems, Windows XP, Vista, and Win 7 32-bit.  I decided to install Windows 7.  There is no option to mount an iso file and I had to search around to find a burner to produce the installation disc.  Windows 7 installed without a problem, though I felt perhaps a little slowly.  This may just have been a case of a watched pot, but I think without the guest tools installed, this would make sense.

The XenClient iso is automatically mounted into the operating system on boot, although this is the only iso file you can mount using the hypervisor.  The installation of the XenClient tools was very temperamental, either giving errors or missing installing the audio device. The installation takes a while longer than I’m used to, but I suppose it’s doing a lot more, after it’s finished two reboots are required.

Experimental Features

The two experimental features available are 3D support and application publishing.  I hope that both features make it off the experimental list and onto the supported list by release.

3D support is configured by enabling it within the Citrix receiver console and installing software in the guest VM.  This went pretty smoothly, I took a couple of screenshots of the windows experience index before and after the 3D was enabled, unfortunately I didn’t take a screenshot before I wiped my laptop to give the best comparison, but here are the two shots:

Before 3D

perf after tools

After 3D

After 3D enabled

As you can see the 3D performance is dramatically better.  How close it is to native performance I’m not sure.  3D can only be enabled on one running VM at a time.  Also you can’t publish applications from a 3D enabled VM, presumably as the publishing protocol (modified ICA?)  can’t handle the difference in screen output.

Application publishing seemed to go well and worked as documented.  You need to install software on both the publishing VM and the receiver.  I wonder if the traffic goes over IP or through the hypervisor, through the hypervisor would be more secure.

Peripheral Support

What is and isn’t supported seems to be a bit of a mystery at the moment, I can’t find any documentation on which classes of device are supported.

The extra buttons on my mouse don’t work, this is apparently because all mouse and keyboard input goes through the hypervisor and the more advanced features are not supported, I would guess that this means any proprietary buttons on a keyboard would be non functional too.  This is actually a much bigger problem than it might appear, with any new tech user acceptance is key and taking away functionality from such basic things as mouse and keyboard would affect almost all users and cause numerous complaints.

USB hard drives worked fine, except they were not recognised on boot, they had to be unplugged and re-plugged to be picked up, presumably the tools need to be running before the device is plugged in.

My USB headset was not recognised at all, despite the drivers being native to Win7.

The fingerprint reader on the laptop also wasn’t recognised.

Although I didn’t have a webcam to test, there are a few forum posts complaining about lack of functionality.  Citrix say they are hoping to have these in by release.

I said before that Wireless support was one of the harder things to do, I think that USB support may well be the other thing they had trouble with.  In device manager I always got a non functional USB hub, whether this is just me or an issue with the tools I don’t know.

XenClient with Synchronizer

Creating the template VM was simple, though remarkably slow, my transfer rate was between 500 and 2,500 kbps, this really needs to improve if you are transferring Gigs of data.  I then created a new VM and used the template for install, again worked fine, if painfully slow.

I then installed a few apps and took a backup of the OS, after which I destroyed my local VM.  Restoring it worked, but first the client downloaded the six Gig template image, then the 10Gig of backup, why not just restore from backup?  This also happened at a snails pace, I had to leave it over night :( .

Additional users can be created in the synchronizer or imported from AD, once there though they can’t be deleted.  This is due to issues with checked out VM’s.

One thing with the Synchronizer appliance, it starts with a default of 20Gig disk space, this will obviously get used up very fast.  Either you can connect it to an NFS share or expand the disk, I’d advise you do one of these at the very start to avoid space issues.

The Synchronizer seems to be very basic at the moment, I’d expect the feature set of this to be expanded before release.

Other stuff I tried

Windows 2008 server (32 bit of course) installed OK, and seemed to work fine, but installing the tools broke the wireless networking.  If you don’t need wireless I don’t see why this shouldn’t be absolutely fine.

Ubuntu wouldn’t install, it hung very soon into the process, though it would run live from the CD.

Is it any good?

I have to say I’m impressed, the consoles were snappy and easy to use, apart from some issues with installing the tools everything worked as documented.  I expected the USB support to be iffy and the HCL to be small. both of these will improve over time.  The HCL especially will get better quickly as Dell and HP OEM XenClient and add their own drivers.

Peripheral support is a big deal, they really need to get USB support as close to native as possible, or acceptance is going to be hard.

Guest support is OK, though Linux guest support needs to arrive quickly, one of the major benefits of having a client hypervisor will be having 3rd party virtual appliances sitting in the background. I’m sure this is where major innovation and value add from third parties is going to come.  Without Linux support, that’s not going to happen.

Citrix XenClient Hardware Compatibility List (HCL)

Posted by Jim Moyle on May 12th, 2010

This list is taken from the CTX125133 article Citrix XenClient 1.0 RC User Guide.

Although as yet the HCL list is currently very small, I bet Citrix will be relying on third party vendors to OEM XenClient and add their own drivers.

Supported laptop models

Vendor Product Type WiFi Graphic CPU Chipset
Dell Latitude E4300 Intel WiFi

5100, Intel

WiFi 5300

Intel

GM45

Intel Centrino

2

Intel

GS45

Express

Chipset

Vendor Product Type WiFi Graphic CPU Chipset
Dell Latitude E6400 Intel WiFi Intel Intel Centrino Intel 45
5100, Intel GM45 2 Express
WiFi 5300 Chipset
Dell Latitude E6500 Intel WiFi Intel Intel Centrino Intel 45
* 5100, Intel GM45 2 Express
WiFi 5300 Chipset
Dell OptiPlex 780 Integrated Intel Core2 Intel Q45
Intel Quad, Intel Express
Graphics Core2 Duo, Chipset
Media Intel Pentium w/
Accelerator Dual Core, ICH10DO
4500 Intel Celeron
Dual Core,
Intel Celeron
Dell Latitude E6410 Intel Centrino Intel HD Intel Centrino Mobile
802.11 Graphics 2 Intel
QM57
Express
Chipset
Dell Latitude E6510 Intel Centrino Intel HD Intel Centrino Mobile
802.11 Graphics 2 Intel
QM57
Express
Chipset
HP Elite Book 6930p Intel WiFi Intel Intel Centrino Intel
5300 GM45 2 GM45
Express
Chipset
HP Elite Book 2530p Intel WiFi Intel Intel Centrino Intel
5100 GM45 2 GM45
Express
Chipset
HP Elite Book 8440p Intel Intel HD Intel Core Mobile
Corporation Graphics i7-720QM, Intel
Centrino Intel Core QM57
Advanced-N i7-620M, Express
6200 Intel Core
i5-540M, Intel
Core i5-520M
Lenovo Think Pad T400 Intel WiFi Intel Intel Centrino Intel 45
5100, Intel GM45 2 Express
WiFi 5300, Chipset
Intel WiFi
5350
Vendor Product Type WiFi Graphic CPU Chipset
Lenovo Think Pad T500 Intel WiFi Intel Intel Centrino Intel 45
5100, Intel GM45 2 Express
WiFi 5300, Chipset
Intel WiFi
5350
Lenovo Think Pad X200 Intel WiFi Intel Intel Centrino Intel 45
5100, Intel X4500 2 Express
WiFi 5300, HD Chipset
Intel WiFi
5350

*The 15.4″ Premium, UltraSharpTM WUXGA (1920×1200) Display with High Brightness (Wide View) is not supported. Only the 15.4″ Premium WXGA+ (1440×900) LED Display (Wide View) and 15.4″ Premium WXGA (1280×800) Display models are supported.

Hardware requirements

XenClient runs on the 64 bit hardware platforms listed above only. The additional hardware requirements are:

• Intel Core 2 Duo series processor with Intel VT-x Technology and VT-d Technology (Intel vPro Technology)

• 4GB or more memory recommended

• 160GB or more disk drive space recommended

• Intel Integrated Graphics 4500MHD

Note:

XenClient does not support the use of non-symmetric RAM DIMMs.

Supported operating systems

XenClient supports the installation of the following operating systems:

• Microsoft Windows 7 32bit

Note:

If you prefer to use Windows XP Mode to run your Windows XP applications on your Windows 7 VM, (instead of using a separate Windows XP VM) please ensure that you download the latest version of the Windows XP Mode software from http://www.microsoft.com/windows/virtual-pc/. Some earlier versions of the Windows XP Mode software used Intel VT-x virtualization technology in a way that conflicted with XenClient operation. The latest version of Windows XP Mode does not use Intel VT-x virtualization technology.

• Microsoft Windows Vista 32bit SP2

• Microsoft Windows XP 32bit SP3

Note:

The installation or modification of software directly on the XenClient host file system is not supported.

What is needed from a IaaS cloud provider for us to cloudburst.

Posted by Jim Moyle on May 10th, 2010

As I’m pulling together my session for BriForum I need to choose which Cloud provider to use for the demo.  I’ve come up with a list of seven pre-requisites I need and thought I’d share them with you.  I’ve refined this list as I’ve experimented with various providers to try and judge their suitability.  This list has been compiled for what I consider to be the minimum for a production IaaS offering.  Don’t take it as gospel though, your needs may be different, regard it as a starting point.

1. Open API

This is needed to automate the start-up, configuration and termination of cloud instances.  Without automation the cloud infrastructure is no use to you, a manual web page driven administration process is not going to win a provider any points with me.  As a secondary point, it’s even better if they provide tools that integrate with these API’s.  Making me write tools, is again, not going to win any points.

2. Secure IP connectivity

By this I mean the ability to secure the connection between a cloud IP subnet and private infrastructure.  If I need to create instances on demand I need to be able to securely access the subnet they are on and hide those machine from the ‘net,  only being able to access machine securely on an individual basis will not do.

3. Decent guest start-up time

By this I mean under ten minutes guaranteed.  If you only promise between 15 and 45 minutes (RackSpace) then it’s too slow.  Also as billing usually starts from the request not the availability I don’t want to be paying for time I’m not using.  The solution for this would be to move to a billing from availability model, this would motivate providers to get guests up quickly.

4. Support for new Guest versions is quickly adopted

If a new hypervisor, or a new OS version comes out I want to be able to take advantage of those features quickly, I especially don’t want my local infrastructure to be held up by interoperability problems with cloud services if they are behind the upgrade curve.  When you are waiting on a large corporation to upgrade and your business is too small to put pressure on them to make you a special case then you are going to get pretty angry pretty quickly.  There is at least on cloud provider (I’m looking at you Amazon EC2) which doesn’t support Windows 7 or Windows Server 2008 R2 and it’s nine months after RTM.

5. Hypervisor Access

I need to be able to upload my own virtual machine appliances, whether they are from a third party or one I’ve made onsite.  I also need to be able to manage the hypervisor layer with the same tools and using the same skills that I already have in house.

6. Keyboard Video Mouse console access

There is a reason that servers have KVM boards, it’s that not all problems happen after you have RDP or SSH access.  You lose a whole lot of troubleshooting information if you lose visibility of the console.

7. Hourly billing

All instances should be able to be billed hourly, I don’t mind if you have monthly charges as well, but hourly should always be available, if I need a resource permanently, I might as well host it myself.  Give us the option to try out, demo and burst into the full range of your offerings.

So have I found a provider that fits the bill?  The short answer is no.  The slightly longer answer is that I’ve found one who are really close, close enough that I’m happy to use it.  That provider is SoftLayer.

I reserve the right to change my mind at any time as providers change their offerings. :)

BriForum 2010

Posted by Jim Moyle on April 29th, 2010

image

BriForum this year will be running from June 15th to 17th in Chicago, this is a conference that I have wanted to go to for a long time, but never before had the chance.  This year not only will I be going for the first time, but will be speaking alongside my colleague Rick Dehlinger.  The topic is spun out from my blog post Do we have the right tools to cloudburst xenapp into ec2?  and will be a deep dive into what it takes to do this.  If you are planning on attending BriForum, come along and see what we have to show you, or just come up and say hello.

Here is the topic detail:

CloudBursting XenApp – hype or reality?

It seems like every vendor in the world is hyping ‘cloud’ somewhere in their marketing pitch. The noise is so prolific that’s it hard for any astute technologist to ignore. As desktop and application delivery specialists, many of us have been building and running ‘clouds’ for quite some time, albeit ‘private clouds’, private delivery systems, centralized hosting environments, or whatever the term du jour may be.

As we’ve come to expect, the delivery technologies we use and the plethora of available services delivered out of the cloud have evolved at a dramatic pace. As we dive down into the microcosm of our specific niche of the industry, we’re seeing a couple vendors pitching a hybrid approach to cloud service consumption – Citrix and Amazon. The noise they’re making together means that we’ll all likely have to field questions on the topic sometime soon, which begs the question: Is it real, or is it hype?

This session explores this hybrid approach to cloud usage (which has been called ‘cloud bursting’) and seeks to answer some of the key questions on all of our minds. It will seek to answer the following questions:

  • What is ‘cloud bursting’?
  • Why would anyone want to do it?
  • Is it reality or hype?
  • What are some of the things we have to consider before adopting such an approach?
  • Which vendors provide the right cloud infrastructure?
  • What are the infrastructure components we need to achieve the right result?
  • How do Citrix and Amazon do it?
  • What support does Citrix provide to help?
  • What support does Amazon provide to help?
  • How do I build it?
  • Can I do it with ‘off the shelf components’?
  • Can I extend my existing infrastructure?

User Installed Applications – My Take

Posted by Jim Moyle on January 29th, 2010
The conversation about user installed applications has been happening for a while now and much has been said about it by many people such as, Andrew Wood, Gareth KitsonChris OldroydDaniel FellerJeff PitschRon Oglesby, Brian MaddenChris Fleck and more.  The purpose of this post is both to oblige a few people who have asked me to put my thoughts down and for me to clarify exactly what I think.  I’m going to ignore BYOC and Client hypervisors for the time being to concentrate on the issues surrounding the applications.
To set out why I think this topic is important.  I think that user installation of applications is the key differentiator for VDI over terminal services, as I said in a previous post Why is VDI changing into Terminal Server? the difference between Terminal Services and VDI is actually very small without it.
If we want to understand why this change is now possible we should look at why it has been impossible in the past.
Terminal Server:  Any change by one person can adversely effect anyone else running on that box, this is not likely to change and to my mind is the biggest single historical drawback to TS based solutions that has no end in sight.
Fat Desktops:  Support is the key here, as if a user broke their PC usually they couldn’t fix it and it took a ‘man in a van’ to go and resolve the issue.  This is especially problematic where the user has a time critical job, or the site is far away.  Of course remote tools help with this, but desktops don’t have kvm boards for when the OS goes south.  Allowing users free rein meant that support calls would go through the roof and as the time to resolve was huge, it meant that without locking down the desktop companies would spend massive amounts of time, energy and money just keeping the wheels on.
The fact that for the past fifteen years whether enterprise desktops have been fat client or terminal server based, the only choice has been to lock them down.  This means industry inertia seems to be almost unstoppable.

The situation has now changed.  Our user base is changing, we now have the Echo/Y generation who grew up with computers, they learn to type at school along with writing.  They break and maintain their own home PCs, they regularly download and use the tools they need to get the job done.  As these people move into management the old monolithic top down attitude of only using what the IT department give them to do their job will be anathema to them and they will start to demand change.  The people who do a job, day in day out, know what tools they need to be productive much better than the IT dept does. If we don’t give them those tools they will resent us for not enabling their work.  We need to empower people to be more productive, not take away their motivation, morale and confidence in the organisation.

If we bring the desktop OS into the datacenter we should be able to bring to bear the tools to enable this kind of user empowerment.

If we are going to allow this we have clasify which are the different types of user installed applications.  To borrow a little from Simon Bramfitt, with some of my own (in italics), here’s what we are talking about:
  • The departmental app that works with business data that is formally acknowledges as being important to that department and has it’s own budget and support mechanism, but is for what ever reason not packaged by IT. This notion may not sit well with some people, but anyone who has worked in a large enterprise knows they exist and might privately offer plenty of justifications as to why an app might fall into this bucket.
  • The communication app: gotomeeting, webex clients etc that may need to be installed by the user, they may also need other clients to tie into outside companies systems eg they may need to install a citrix web client. Or a propriety Active X plugin for company XYZ’s web app.
  • The personal productivity app that fulfills a limited business function, legitimately purchased but not formally acknowledged by IT as a supported app. A copy of MindMapper maybe that’s needed to map up a new business process. It may only be used by a few people across the enterprise but it fills an important role for them.
  • The personal non-productivity tool like iTunes that is OK to have in a BYOPC environment, but not the sort of thing you want interfering with the corporate computing environment. Although a case could be made for iTunes U and work oriented podcasts etc.
  • The totally unauthorised, no excuse, just down loaded from the internet, malware vector that claimed to be a free ring-tone generator.

As Microsoft found out to its cost allowing uncontrolled user installed apps is a nightmare. So if a user can install all of the above how do we both allow the right apps and protect ourselves against the wrong ones AND reduce our support costs?

  • Any application that directly manipulates business data must provided by the enterprise.
  • The desktop OS must be treated as an untrusted device.
  • Approved applications should be delivered by TS or App streaming.
  • The users must have a method for choosing from available enterprise applications.
  • Users data and enterprise application settings must be separate from user installed application settings.
  • Users must have have the ability to roll back their environment to any point in the past, while keeping data and enterprise application customisations.
  • Users must be able to reset their machines to virgin state whilst keeping data and enterprise application settings.
The last two are the keys to reducing the support costs, ie if the user breaks things you give them the tools to fix it, without needing to have IT skills.  This is possible at the moment with Atlantis, also AppSense have something in the works to enable this coming out soon.
If the users have an appropriate method to choose their own enterprise apps eg Dazzle, they are less likely to need to install their own.  If a large percentage of users are installing a certain app, for instance if a client sends a department files in tar.gz format and 7-zip becomes prevalent in the organisation then the IT department should be able to see this and change it from an unsupported user installed application to a supported enterprise provided application, I call this the ‘park paths‘ methodology.  To do this you need a way to catalog exactly what users are installing.  As an interesting side effect, this may be what brings Open Source apps into the enterprise for the first time.

If users can provide themselves with the tools they need in a timely fashion and lets face it this is exactly what IT admins have been doing for years, business agility is increased, with the right tools support is decreased and application provision is improved.  Giving the organisation lower costs and a competitive advantage.

User installed applications are a minefield, but with the right approach I believe that it could be the VDI killer feature.

Do we have the right tools to cloudburst XenApp into EC2 now?

Posted by Jim Moyle on December 17th, 2009

Amazon Web Services With the release recently of the Amazon workflow studio library for Citrix’s Workflow Studio product, one of the major pieces fell in to place to enable us to cloudburst XenApp into the EC2 cloud.  Now it’s here I want to have a look at whether we now have all the tools we need to start putting this into practice.

So what is cloudbursting and why would you want to do it?  Cloudbursting is the ability to expand your existing datacenter infrastructure into the cloud.  This could be useful at times of high demand, for instance seasonal peaks around Christmas, or if your existing infrastructure loses capacity in a disaster recovery situation.

One of the major stumbling blocks on the way to widespread acceptance of utilising cloud infrastructure is the fear in the eyes of many executives of losing control of their data.  What happens when your critical data is stored on someone else’s infrastructure?  Is it secure?  Is it reliable?  Is the support good enough?  What are the response times?  Can you believe the providers when they say you don’t have to worry about your fears?

This is why a XenApp workload is particularly suitable to cloudbursting, there shouldn’t be any data stored on XenApp servers.  Also they should all be identical, making them conducive to fast provisioning.

Imagine the situation of a sales call center, over the xmas period they hire temp staff to cope with extra demand, as their software is provided via XenApp, the company needs to have enough infrastructure to cope with the demand peak, but sits idle most of the year.  As EC2 charges per hour then the ability to use EC2 to cope with the extra demand could save large sums of money.

Now we have decided that cloudbursting is a good idea, can we actually do it?  By this I mean are the tools available off the shelf, without a large development effort.

So what do we need?

  • A secure link between the cloud and your datacenter
  • The ability to quickly provision and decommission servers in the cloud
  • WAN acceleration between the cloud and you
  • Monitoring to know when to cloudburst
  • Automation to control it all

Lets take these needs one by one:

A secure link between the cloud and you, currently Citrix provide an Amazon Machine Image (AMI) template for Citrix Access Gateway (CAG) with one in the cloud and one in the premise you can have a secure channel between the two. You could also use the Vyatta AMI.  Full marks

With the new workflow studio library we can quickly provision our own saved AMI’s and destroy them when needed.  The question here is why are we not using Provisioning server?  It would be best to provision a ‘bare metal’ server and PXE boot to receive a provisioning server vdisk. So Half marks.

WAN acceleration is possible, you can install the software repeater client on the XenApp servers, but a proper Repeater AMI would be better. Half marks again

Monitoring could be done either with Edgesight or the power and capacity management feature, so full marks.

Automation is the big problem, although workflow studio, now it’s 2.0 with more libraries, is getting there, at the moment it simply doesn’t have enough pre-configured workflows or libraries to cope.  We need a way to join the Servers to the domain and farm and publish the applications. (Although XenApp 6 will let us do this using GPO’s). We could script this, but I want to do it without any dev work. It also needs to be able to take in the output from the power and capacity management feature set.

So where does that leave us?  I’d say it leaves us almost there, in fact with a little powershell knowledge and using the tech preview of XenApp 6 it’s possible right now.

If I have time over the holidays I think I’ll try and set it up and let you know how I get on.


Copyright © 2007 JimMoyle.com. All rights reserved.