Fix: Cannot run upgrade script on host, ESXi 5.5 

During a recent upgrade I found that one of the ESXi hosts just would not update using Update Manager. The error I was seeing was “Cannot run upgrade script on host”.

After a bit of searching I found this article which related to ESXi 5.1 upgrade to 5.5 but the steps worked well to fix the issue I was seeing.

In order to fix the issue I performed the following steps:

Step 1: Disable HA for the cluster

Disable Cluster HA

Step 2: Go to vCenter Networking. Select the distributed vswitch and then select the hosts tab. From here, right-click on the host you need to reboot and select Remove from vSphere Distributed Switch

Remove Distributed Switch

Click Yes to remove the host from the switch.

Confirm vDS Removal

Step 3: Remove the host from the cluster

Remove ESXi host from cluster

Step 4: Enter the host into maintenance mode and then choose to reboot.

Enter Maintenance Mode
Step 5: Connect via SSH to the ESXi host and run the following commands to uninstall the FDM agent:

>
cp /opt/vmware/uninstallers/VMware-fdm-uninstall.sh /tmp
chmod +x /tmp/VMware-fdm-uninstall.sh
/tmp/VMware-fdm-uninstall.sh
>

SSH Host FDM Uninstaller
Step 6: Reboot the host

Reboot the host
Step 7: Add the ESXi host back to the cluster

rejoin host to cluster step 1

rejoin host to cluster step 2

rejoin host to cluster step 3

rejoin host to cluster step 4
Step 8: Re-add the host to the Distributed vSwitch. Go to Networking -> select the distributed vswitch. Right-click and select Manage Hosts.

Manage vDS

Select the host

Select Host

Select vnics for Uplinks to be managed by the switch

Manage vDS uplinks

Step 9: Turn vSphere HA back on for the cluster the host resides on.

Turn on vSphere HA

Step 10: Run the upgrade again from Update Manager and this time it will work.

post

How To: Upgrade to ESXi 5.5 Update 3b on Cisco UCS

ESXi upgrade preparation

With Cisco UCS you really need to make sure that your ESXi hosts are running the correct driver version. If you’re running NFS or FCoE storage into your ESXi hosts as either datastores or RDM disks then it’s critical that you have the right fnic and enic drivers. Even if you use the Cisco Custom image for ESXi upgrades the enic and fnic drivers may not be correct according to the compatibility matrix. I’ve had this issue in the past and I saw intermittent NFS datastores going offline for a Dev ESXi host and the resolution was to upgrade the enic driver which handles ethernet storage connectivity.

The best place to go is to VMware’s compatibility site for IO drivers which comes under the System/Servers. To find out which drivers you currently have you will need to check on the driver versions on the ESXi hosts. This can be done by following KB1027206. Using the values for the Vendor ID, Device ID, Sub-Vendor ID and Sub-Device ID it’s possible to pinpoint the interoperability with your respective hardware. In my case I have both VIC1340 and VIC1240 in the mix so I had to go through the process twice. Primarily you’ll be using the ‘ethtool -i’ command to find the driver version.

enic_driver_check_vmware_kb_steps
e.g. You can check the UCS VIC 1240 for FCoE CNAs on ESXi 5.5 Update 3 here

In this image you can see the version of enic drivers I’m running, 2.1.2.71 doesn’t match the firmware version that will be installed as part of the Cisco Custom ISO image. This shows that the enic driver version will need to be upgraded as part of the process.

enic_driver_check_vmware

Read More

post

Melbourne VMUG Review

Once again the Melbourne VMUG UserCon was a massive success and had some great speakers and sessions. Given that there were such IT heavy hitters as Scott Lowe (@scott_lowe), Chris Wahl (@chriswahl) and Keith Townsend (@CTOAdvisor) as well as a number of local IT stars such as Frank Fan (@frankfan7), Anthony Burke (@pandom_), Anthony Spiteri (@anthonyspiteri) and Craig Waters (@cswaters1) it’s not surprising that it was a great event

One of my goals for the day was to attend a number of the community sessions. I found the vBrownBag sessions conducted by Alastair Cooke (@demitassenz) to be the most informative and entertaining sessions of the day, along with those of Chris Wahl. The award for the funniest session of the day went to Simon Sharwood (@ssharwood) from the Register as part of the vBrownBag session. It wasn’t just entertaining but a great insight into how content is derived for the site.

I missed one of the sessions I had intended on getting to but here’s a break down of the sessions I did attend.

post

Melbourne VMUG – Preview

userConn

 

The annual Melbourne VMUG UserCon takes place this Thursday, 25th February. It’s also an important day for me as it’s my wedding anniversary. I know which one my wife is more interested in! But, for the IT community in Melbourne all eyes will be on the VMUG. This years event has moved location from the old Hilton on the Park to Crown on Southbank. I think this is a good move and makes the VMUG even more accessible than in previous years. Last years guest speakers were excellent with Chad Sakac, Vaughan Stewart and John Troyer and this year it’s been lifted another notch again. This year the enterprise IT giants include Scott Lowe (@scott_lowe), Keith Townsend (@CTOAdvisor), Brad Tompkins (@VMUG_CEO) and my own personal IT hero Chris Wahl (@ChrisWahl). There’s also going to be vBrownBag sessions being hosted by Alastair Cooke (@demitassenz). If you’ve been following twitter you’ll have seen that Scott’s been having issues with flights and has had United basically crap all over his plans. Hopefully things work out for him and he can make it on time to the Sydney VMUG on Tuesday 23rd but it looks like it’ll be a close call. I wish him safe travels from here on.

VMUGs are all about the community. It’s the primary reason it exists and we’re incredibly fortunate to have the organisers volunteer their time to put on such a great event. Melbourne has some of the finest at its helm and that has been recognised globally. If you haven’t attended before I’d highly recommend fitting it into your calendar. You’ll be glad you did and your employer will be glad you did too. It’s really worth getting to the keynotes at the UserCon as unlike keynotes at other events they are not strictly vendor focused and they can provide some real insight into your industry as a whole and even your career path. But the main focus should be the community speeches. Hearing from others out in the field about the trials and tribulations they’ve had with specific technology is where the real learning takes place. These contribute a shorter part of the agenda and it’s something I’d like to see more of in future events but I also appreciate that it’s hard to get speakers for such sessions. For me this year that’ll be my focus outside of the keynotes. There are a number of vendor based sessions as well throughout the day that delve into new technology .

There’s a lot of information and knowledge to be gleaned from this event. I’d also recommend working out your agenda before attending and have a ponder over what you’d like to get out of the event. The sessions I’m planning on attending are:

Unfortunately the community sessions clash in times but if they didn’t I’d attend the following. As I can only be in one place at one time I’ll be at the Chris Wahl session.

Read More

post

VMware Metro Storage Cluster Overview

VMware Metro Storage Cluster

VMware Metro Storage Cluster (vMSC) allows vCenter to stretch across two data centers in geographically dispersed locations. In normal circumstances, in vSphere 5.5 and below at least, vCenter would be deployed in Link-Mode so two vCenters can be managed as one. However, with vMSC it’s possible to have one vCenter manage all resources across two sites and leverage the underlying stretch storage and networking infrastructures. I’ve done previous blogs on NetApp MetroCluster to describe how a stretched storage cluster is spread across two disparate data centers. I’d also recommend reading a previous post done on vMSC by Paul Meehan over on www.virtualizationsoftware.com. The idea behind this post is to provide the VMware view for the MetroCluster posts and to give a better idea on how MetroCluster storage links into virtualization environments.

The main benefit of a stretched cluster is that it enables workload and resource balancing across datacenters. This helps companies to reach almost zero RTO and RPOs and ensure uptime of critical systems as workloads can be migrated easing using vMotion and Storage vMotion. One thing to keep in mind regarding vMSC, it’s not really sold as a disaster recover solution but rather a disaster avoidance solution when linked with the underlying storage. Some of the other benefits of a stretched cluster are:

  • Workload mobility
  • Cross-site automated load balancing
  • Enhanced downtime avoidance
  • Disaster avoidance
  • System uptime and high availability

There are a number of storage vendors that provide the back-end storage required for a vMSC to work. I won’t go into the entire list but you can find out more on the VMware Compatibility Matrix site. The one that I have experience with is NetApp MetroCluster but I know of others from EMC and Hitachi at least. So what components make up a vMSC? It comes down to an extended layer 2 network across data centers so that vMotions can take place with ease and also a resilient storage platform connected to ESXi via VMFS or NFS datastores. VMware vCenter itself does need some configuration changes but it’s nothing outside the scope of what a regular VMware admin can implement. A view of what a vMSC looks like is below. The networking and storage components have been simplified.

fabric metro cluster diagram

 

Read More

post

Fix: VMware – Quiesced Snapshots failing – Unexpected error DeviceIoControl

I ran into an interesting problem that took a bit of digging around to both find the root cause and also to find the final fix. When running backups on Vmware 5.5 running on NetApp storage I could see some, but not all VMs, failing and throwing up the below errors in the event logs

Event ID 57 ntfs Warning
The system failed to flush data to the transaction log. Corruption may occur.

Event ID: 137 ntfs Error
The default transaction resource manager on volume \?Volume{806289e8-6088-11e0-a168-005056ae003d} encountered a non-retryable error and could not start. The data contains the error code.

Event ID: 12289 VSS Error
Volume Shadow Copy Service error: Unexpected error DeviceIoControl(\?fdc#generic_floppy_drive#6&2bc13940&0&0#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b} - 00000000000004A0,0x00560000,0000000000000000,0,0000000000353B50,4096,[0]). hr = 0x80070001, Incorrect function.


The key alert here is Event ID 12289. It was also the most off-putting. It initially looked like a floppy drive issue but there was no floppy drive attached to the VM nor were there any floppy drivers installed on the VM. A look around the VMware community forums led me to this posting – https://communities.vmware.com/thread/309844?start=0&tstart=0 It was focused more on vSphere 4.1 however and most of the advice was around installing an older version of VMware Tools. Comment 27 was the jackpot winner. The System Reserved partition was causing the issue.

So what does the System Reserved partition do?

The System Reserved partition contains the Boot Manager and Boot Configuration data that are read on start up of the virtual machine. The VM boots from the boot loader n the System Reserved partition and then boots Windows from the System drive. It is also used as a location for the start up files for BitLocker Drive Encryption. If you need BitLocker then you’ll need to have a System Reserved partition. For Windows client OS’s then that’s a great feature to have but from a server OS perspective where BitLocker just isn’t used then it’s superfluous. The System Reserved partition is created by default on OS installation so there’s two options to remediate.

  1. Remove the partition manually post installation
  2. Remove the partition from your Windows OS templates

I won’t go into the details on how to remove the partition from your templates here but you can find more information over on mydigitallife.info which can be used. I ran through the steps myself to do this for all of our Windows templates following finding the root cause of the initial error.

As per one of the links mentioned in Comment 27 in the VMware communities post it’s possible to change the location of the boot files so that the partition can be removed. This information can be found over on geekshangout.com. However the steps didn’t include how to re-claim that partition so that there isn’t an unallocated disk partition sitting in front of the C drive (disk 0). While I haven’t tested backups in this configuration I wouldn’t be surprised if it cause other issues during backup. So below I’ve listed the steps to follow so you can successfully remove the partition as per the steps on geekshangout and then re-claim the space on gparted.

Delete System Reserved partition and reclaim space

Read More

VMware – Security vulnerability VMSA-2015-0007

VMware announced over the weekend that some major security vulnerabilities have been identified in vCenter and ESXi 5.0, 5.1 and 5.5 as well as version 6.0. 6.0 Update 1 is not affected. Only the JMX RMI Remote code execution is an issue in vSphere 6.0. 3 vulnerabilities have been identified and the affect different versions in total.

ESXi OpenSLP Remote Code Execution

  • Allows unauthenticated users to execute code remotely on ESXi host

vCenter Server JMX RMI Remote Code Execution

  • An unauthenticated remote attacker that is able to connect to the service to execute arbitrary code on the vCenter server

vCenter Server vpxd denial-of-service vulnerability

  • Can allow a remote user to create a denial of service on the vpxd service through unsanitized heartbeat messages

The announcement was broken on both the VMware and TheRegister sites and I’d recommend viewing more information on both of those sites. TheRegister also gives some great background on how the issues were originally identified. The full advisory details including links to the CVE references can be viewed on the VMware Security Advisories site for VMSA-2015-0007.

If you are running vSphere 5.0 the recommendation is to upgrade to v5.0 Update 3e. For vSphere 5.1 upgrade to v5.1 Update 3. For vSphere 6 the recommendation is to patch with Update 1. vSphere 5.5 however has some issues. In order to fix the denial-of-service or the OpenSLP issues it’s advised to upgrade to vSphere 5.5 Update 2. However, to resolve the JMX RMI issue VMware have confirmed that vSphere 5.5 Update 3 which was released in early September as being the fix. But, a new bug has been identified with Update Patch 3 regarding snapshots. If a snapshot is deleted in vCenter it causes the VM to crash. Considering that the majority of snapshot related backup solutions utilise VMware snapshots it means that all VMs would reboot each night. Considering uptime is always a business and IT priority then it’s really not a feasible solution.

My advice would be to at least upgrade to vSphere 5.5 Update 2 if you can. Upgrade to vSphere 6.0 Update 1 if possible but that may require considerable research and interoperability checks and may not be on your roadmap just yet. Do not install ESXi 5.5 Patch 3 if your backup software depends on VMware snapshots.

VMware Validated Designs

What are VMware Validated Designs?

VMware announced at VMworld earlier this year that they have been working on  implementing VMware Validated Designs. This is a fantastic step by VMware and shows a maturity that has come from years of being the number one virtualisation platform. Cisco had had validated designs for years and I refer to them regularly when deploying Cisco related infrastructure. Through the implementation of validated designs VMware is assisting the community to develop and implement consistent designs across infrastructures which will help provide a consistency and familiarity not currently present. When a new platform is being deployed the elements to consider can include compute, storage, network, security, automation and operations. These are not just reference architectures, the validated designs are constantly updated continuously.
This video gives a bit more of an explanation around what VMware Validated Designs are. The designs have been split into pods, Management, Edge and Compute. Management is made up of vCenter Server, vRealize Operations Manager, vRealize Log Insight and VMware Horizon. Network and security are provided by VMware NSX, storage is provided VSAN. The Edge pod provides additional NSX support to allow external access to compute workloads. The compute pod is the heavy lifting pod.

 

Read More

post

HowTo: vROPS – Blue Medora Cisco UCS and NetApp Management Pack installs

Following on from installing vROPS a few month back I finally made the jump to install the Blue Medora management packs for both Cisco UCS and NetApp to get greater visibility into my virtual environment and the underlying physical infrastructure. I’m really looking forward to seeing what these management packs have to offer. While I’m not going to cover off the dashboards provided by the management packs in this post it is something I plan on revisiting once it’s been in use for a while and I’ve done a bit more playing around with it. The reason I’m posting this deployment process is that despite Blue Medora having decent installation guide it’s not always 100% clear, so I’ve done this to hopefully help guide a few others through the process a bit easier.

Cisco UCS Management Pack Deployment

Before you begin this deployment you can download trial versions from Blue Medora and if you want a permanent installation purchase some licenses from Blue Medora.

1: In vRealize Operations Manager go to the Administration -> Solutions

Blue Medora UCS Management Pack Install Step 1 Read More

post

Unable to upgrade VMware Tools – VMwareTools64.msi is not a valid package

A while back I upgrade my vCenter and vSphere environment to 5.5 Update 2. As part of this upgrade VMware Tools was upgraded on most servers. Except that is of vCenter itself. This wasn’t a major issue but other issues began to arise where alerts came for disk consolidation problems. On investigation of this most KB articles were pointing towards upgrading the VMware Tools and that should fix the problem. So that’s what I tried. When running the VMware Tools installation on the vCenter VM I got an error that the VMwareTools64.msi was not a valid installation package and to find the correct package to install. I tried a number of things to get this to work but it would just not run the VMwareTools64.msi. I also couldn’t update the VM through Update Manager either.

vmwaretools64-error

The first step was to get the correct VMware Tools version as a standalone ISO. Since I performed the upgrade VMware have released a new version of VMware Tools, now it’s version 10, and that’s the only one that can be downloaded from the support site. The version I’m looking for is 9.4.5 and I don’t want to install version 10 without doing prior deployment to the test environment. And this all led me to Vladan’s website article called Manual Download of VMware Tools from VMware Website. Thanks to this article I was quickly able to get the VMware Tools package that I needed.  You can go to http://packages.vmware.com/tools and select the VMware Tools version you need for download. The ISO was added to the ISO Datastore and mounted to the VM.

Following this I tried a number of different VMware KB articles but the one I finally found to work was KB1012693. This involved opening a command prompt, changing directory to the CD drive where VMware Tools was mounted and running the command:

setup64.exe /c

Once that completed I re-ran the VMTools installation and it completed successfully. Following the server reboot the VMTools are showing as up to date in vCenter.