Fix: Cannot run upgrade script on host, ESXi 5.5 

During a recent upgrade I found that one of the ESXi hosts just would not update using Update Manager. The error I was seeing was “Cannot run upgrade script on host”.

After a bit of searching I found this article which related to ESXi 5.1 upgrade to 5.5 but the steps worked well to fix the issue I was seeing.

In order to fix the issue I performed the following steps:

Step 1: Disable HA for the cluster

Disable Cluster HA

Step 2: Go to vCenter Networking. Select the distributed vswitch and then select the hosts tab. From here, right-click on the host you need to reboot and select Remove from vSphere Distributed Switch

Remove Distributed Switch

Click Yes to remove the host from the switch.

Confirm vDS Removal

Step 3: Remove the host from the cluster

Remove ESXi host from cluster

Step 4: Enter the host into maintenance mode and then choose to reboot.

Enter Maintenance Mode
Step 5: Connect via SSH to the ESXi host and run the following commands to uninstall the FDM agent:

>
cp /opt/vmware/uninstallers/VMware-fdm-uninstall.sh /tmp
chmod +x /tmp/VMware-fdm-uninstall.sh
/tmp/VMware-fdm-uninstall.sh
>

SSH Host FDM Uninstaller
Step 6: Reboot the host

Reboot the host
Step 7: Add the ESXi host back to the cluster

rejoin host to cluster step 1

rejoin host to cluster step 2

rejoin host to cluster step 3

rejoin host to cluster step 4
Step 8: Re-add the host to the Distributed vSwitch. Go to Networking -> select the distributed vswitch. Right-click and select Manage Hosts.

Manage vDS

Select the host

Select Host

Select vnics for Uplinks to be managed by the switch

Manage vDS uplinks

Step 9: Turn vSphere HA back on for the cluster the host resides on.

Turn on vSphere HA

Step 10: Run the upgrade again from Update Manager and this time it will work.

post

How To: Upgrade to ESXi 5.5 Update 3b on Cisco UCS

ESXi upgrade preparation

With Cisco UCS you really need to make sure that your ESXi hosts are running the correct driver version. If you’re running NFS or FCoE storage into your ESXi hosts as either datastores or RDM disks then it’s critical that you have the right fnic and enic drivers. Even if you use the Cisco Custom image for ESXi upgrades the enic and fnic drivers may not be correct according to the compatibility matrix. I’ve had this issue in the past and I saw intermittent NFS datastores going offline for a Dev ESXi host and the resolution was to upgrade the enic driver which handles ethernet storage connectivity.

The best place to go is to VMware’s compatibility site for IO drivers which comes under the System/Servers. To find out which drivers you currently have you will need to check on the driver versions on the ESXi hosts. This can be done by following KB1027206. Using the values for the Vendor ID, Device ID, Sub-Vendor ID and Sub-Device ID it’s possible to pinpoint the interoperability with your respective hardware. In my case I have both VIC1340 and VIC1240 in the mix so I had to go through the process twice. Primarily you’ll be using the ‘ethtool -i’ command to find the driver version.

enic_driver_check_vmware_kb_steps
e.g. You can check the UCS VIC 1240 for FCoE CNAs on ESXi 5.5 Update 3 here

In this image you can see the version of enic drivers I’m running, 2.1.2.71 doesn’t match the firmware version that will be installed as part of the Cisco Custom ISO image. This shows that the enic driver version will need to be upgraded as part of the process.

enic_driver_check_vmware

Read More

post

Fix: vCenter failure to upgrade – unable to configure log browser windows service

During a recent upgrade from vCenter Server 5.5 Update 2d to vCenter Server 5.5 Update 3b it kept failing at the web client upgrade. After successfully upgrading Single-Sign On I proceeded with the upgrade of vSphere Web Client. I got the following error during the installation:

Error 29702 unable to configure log browser windows service please check vminst.log in system temporary folder for details

The update to 5.5 3b caused disk capacity to fill up and make the installation process unable to finish the upgrade. The SSO install worked but the WebClient fails with error 29702. The primary issue was that over 40GB of space on C drive was taken up with SSO upgrade. I searched for fixes and found the following link but before carrying out the task of removing the Java Components and re-installing again I wanted to check with support on the procedure.

The steps I followed to fix the issue were:

Step 1: Go to Control Panel, select VMware vCenter Server – Java Components and select uninstall

vmware java component unistall

Step 2: Click ok to confirm the uninstall

vmware java component unistall step 2

Step 3: Click Yes to confirm reboot

java component uninstall step 3

Step 4: Following the reboot you can then begin the upgrade process once again and this time it will succeeed. Run the vCenter installer and from Custom Install select vCenter Single Sign-On. Click Next.

vcenter upgrade step 1

Step 5: Click Install

vcenter upgrade step 3

Step 6: The single sign-on components will begin to install, including components such as OpenSSL

vcenter upgrade step 3

One of the key components being installed is VMware JRE.

vcenter upgrade step 4 vmware JRE

Step 7: If you get prompted to close some applications select “Close the applications and attempt to restart them”. Click Ok.

vcenter upgrade step 5

Click ok to the prompt to close apps automatically

vcenter upgrade step 6

Step 8: Click Finish to complete the Single Sign-On upgrade

vcenter upgrade step 7

Step 9: Click on vCenter Web Client to begin the next stage of the upgrade

vmware upgrade step 8

Step 10: Click Yes to continue

vmware upgrade step 9

Step 11: Click Accept License agreement and click Next

vmware license agreement

Step 12: Click Install to begin the web client installation

vsphere web client install

Step 13: Click Finish to complete the installation

vsphere web client installation completion

Once you click Finish click Ok on the dialog to advise that the services will take a few minutes to restart

vsphere web client installation completion 1

Step 14: Select vCenter Inventory Service and click Install

vcenter inventory service upgrade step 1

Step 15: Click Yes for Inventory Service install

vcenter inventory service upgrade step 2

Step 16: Click Next to continue the installation process

vcenter inventory service upgrade step 3

Step 17: Click Accept License agreement and click Next

vcenter inventory service upgrade step 4

Step 18: Click Install for inventory service

vcenter inventory service upgrade step 5

Step 19: Click Finish on completion

vcenter inventory service upgrade step 6

Step 20: Install vCenter Server

vcenter server upgrade step 1

Step 21: Click Ok to continue

vcenter server upgrade step 2

Step 22: Click Next to continue

vcenter server upgrade step 3

Step 23: Click to accept the license and click Next

vcenter server upgrade step 4

Step 24: Enter the database user login credentials, VC_User

vcenter server upgrade step 5

Step 25: Click Install at the Customer Experience Improvement Program

vcenter server upgrade step 6

Step 26: Click Finish to complete the installation

vcenter inventory service upgrade step 6

post

Melbourne VMUG Review

Once again the Melbourne VMUG UserCon was a massive success and had some great speakers and sessions. Given that there were such IT heavy hitters as Scott Lowe (@scott_lowe), Chris Wahl (@chriswahl) and Keith Townsend (@CTOAdvisor) as well as a number of local IT stars such as Frank Fan (@frankfan7), Anthony Burke (@pandom_), Anthony Spiteri (@anthonyspiteri) and Craig Waters (@cswaters1) it’s not surprising that it was a great event

One of my goals for the day was to attend a number of the community sessions. I found the vBrownBag sessions conducted by Alastair Cooke (@demitassenz) to be the most informative and entertaining sessions of the day, along with those of Chris Wahl. The award for the funniest session of the day went to Simon Sharwood (@ssharwood) from the Register as part of the vBrownBag session. It wasn’t just entertaining but a great insight into how content is derived for the site.

I missed one of the sessions I had intended on getting to but here’s a break down of the sessions I did attend.

post

Melbourne VMUG – Preview

userConn

 

The annual Melbourne VMUG UserCon takes place this Thursday, 25th February. It’s also an important day for me as it’s my wedding anniversary. I know which one my wife is more interested in! But, for the IT community in Melbourne all eyes will be on the VMUG. This years event has moved location from the old Hilton on the Park to Crown on Southbank. I think this is a good move and makes the VMUG even more accessible than in previous years. Last years guest speakers were excellent with Chad Sakac, Vaughan Stewart and John Troyer and this year it’s been lifted another notch again. This year the enterprise IT giants include Scott Lowe (@scott_lowe), Keith Townsend (@CTOAdvisor), Brad Tompkins (@VMUG_CEO) and my own personal IT hero Chris Wahl (@ChrisWahl). There’s also going to be vBrownBag sessions being hosted by Alastair Cooke (@demitassenz). If you’ve been following twitter you’ll have seen that Scott’s been having issues with flights and has had United basically crap all over his plans. Hopefully things work out for him and he can make it on time to the Sydney VMUG on Tuesday 23rd but it looks like it’ll be a close call. I wish him safe travels from here on.

VMUGs are all about the community. It’s the primary reason it exists and we’re incredibly fortunate to have the organisers volunteer their time to put on such a great event. Melbourne has some of the finest at its helm and that has been recognised globally. If you haven’t attended before I’d highly recommend fitting it into your calendar. You’ll be glad you did and your employer will be glad you did too. It’s really worth getting to the keynotes at the UserCon as unlike keynotes at other events they are not strictly vendor focused and they can provide some real insight into your industry as a whole and even your career path. But the main focus should be the community speeches. Hearing from others out in the field about the trials and tribulations they’ve had with specific technology is where the real learning takes place. These contribute a shorter part of the agenda and it’s something I’d like to see more of in future events but I also appreciate that it’s hard to get speakers for such sessions. For me this year that’ll be my focus outside of the keynotes. There are a number of vendor based sessions as well throughout the day that delve into new technology .

There’s a lot of information and knowledge to be gleaned from this event. I’d also recommend working out your agenda before attending and have a ponder over what you’d like to get out of the event. The sessions I’m planning on attending are:

Unfortunately the community sessions clash in times but if they didn’t I’d attend the following. As I can only be in one place at one time I’ll be at the Chris Wahl session.

Read More

post

Blogs, community and other skills

Early this year I decided to up the ante a bit on my level of blogging. While I had really started to take it a bit more seriously the year before I wanted to make a concerted effort this year. During the months running up to the end of 2014 the traffic on the blog had grown quite significantly from what it had previously been. This was at a point when I wasn’t putting out any content all that regularly so it came as a surprise and encouraged me to think about creating more content. Anthony Burke over at NetworkInferno, a great blog if you get some downtime to have a flick through, wrote an article earlier this year which completely summed up my reasons for doing a blog. It’s called VMUG, Community and you (me). In that post Anthony talks about his VMUG contribution, his blog, career and how other skills have developed. All thanks to taking an active part in the community.

For me, I basically use the blog as a means to share my thoughts and experiences and probably most importantly as a way to cure professional isolation, similar to Anthony. I also see it as a way to provide assistance to someone else who may face similar challenges. I’ve been lucky enough to have been dug out of some holes thanks to someone else taking the time to write up their experiences and fixes to problems and I feel it’s only right that I reciprocate. Maintaining a blog and setting myself challenges to produce x number of blog posts does not come naturally to me. Writing doesn’t come naturally to me. It’s something I’ve struggled with but I’ve found that writing blog posts has been a great way of forcing me to be more concise. Another upside, and this is invaluable really, is that it has helped me formulate my opinions and understanding of technology. Through researching topics to ensure that what I’m writing is accurate I’ve gained a far more in-depth understanding of the core concepts of a number of technologies and this has without doubt made me a better employee.

Read More

post

VMware Metro Storage Cluster Overview

VMware Metro Storage Cluster

VMware Metro Storage Cluster (vMSC) allows vCenter to stretch across two data centers in geographically dispersed locations. In normal circumstances, in vSphere 5.5 and below at least, vCenter would be deployed in Link-Mode so two vCenters can be managed as one. However, with vMSC it’s possible to have one vCenter manage all resources across two sites and leverage the underlying stretch storage and networking infrastructures. I’ve done previous blogs on NetApp MetroCluster to describe how a stretched storage cluster is spread across two disparate data centers. I’d also recommend reading a previous post done on vMSC by Paul Meehan over on www.virtualizationsoftware.com. The idea behind this post is to provide the VMware view for the MetroCluster posts and to give a better idea on how MetroCluster storage links into virtualization environments.

The main benefit of a stretched cluster is that it enables workload and resource balancing across datacenters. This helps companies to reach almost zero RTO and RPOs and ensure uptime of critical systems as workloads can be migrated easing using vMotion and Storage vMotion. One thing to keep in mind regarding vMSC, it’s not really sold as a disaster recover solution but rather a disaster avoidance solution when linked with the underlying storage. Some of the other benefits of a stretched cluster are:

  • Workload mobility
  • Cross-site automated load balancing
  • Enhanced downtime avoidance
  • Disaster avoidance
  • System uptime and high availability

There are a number of storage vendors that provide the back-end storage required for a vMSC to work. I won’t go into the entire list but you can find out more on the VMware Compatibility Matrix site. The one that I have experience with is NetApp MetroCluster but I know of others from EMC and Hitachi at least. So what components make up a vMSC? It comes down to an extended layer 2 network across data centers so that vMotions can take place with ease and also a resilient storage platform connected to ESXi via VMFS or NFS datastores. VMware vCenter itself does need some configuration changes but it’s nothing outside the scope of what a regular VMware admin can implement. A view of what a vMSC looks like is below. The networking and storage components have been simplified.

fabric metro cluster diagram

 

Read More

post

Fix: VMware – Quiesced Snapshots failing – Unexpected error DeviceIoControl

I ran into an interesting problem that took a bit of digging around to both find the root cause and also to find the final fix. When running backups on Vmware 5.5 running on NetApp storage I could see some, but not all VMs, failing and throwing up the below errors in the event logs

Event ID 57 ntfs Warning
The system failed to flush data to the transaction log. Corruption may occur.

Event ID: 137 ntfs Error
The default transaction resource manager on volume \?Volume{806289e8-6088-11e0-a168-005056ae003d} encountered a non-retryable error and could not start. The data contains the error code.

Event ID: 12289 VSS Error
Volume Shadow Copy Service error: Unexpected error DeviceIoControl(\?fdc#generic_floppy_drive#6&2bc13940&0&0#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b} - 00000000000004A0,0x00560000,0000000000000000,0,0000000000353B50,4096,[0]). hr = 0x80070001, Incorrect function.


The key alert here is Event ID 12289. It was also the most off-putting. It initially looked like a floppy drive issue but there was no floppy drive attached to the VM nor were there any floppy drivers installed on the VM. A look around the VMware community forums led me to this posting – https://communities.vmware.com/thread/309844?start=0&tstart=0 It was focused more on vSphere 4.1 however and most of the advice was around installing an older version of VMware Tools. Comment 27 was the jackpot winner. The System Reserved partition was causing the issue.

So what does the System Reserved partition do?

The System Reserved partition contains the Boot Manager and Boot Configuration data that are read on start up of the virtual machine. The VM boots from the boot loader n the System Reserved partition and then boots Windows from the System drive. It is also used as a location for the start up files for BitLocker Drive Encryption. If you need BitLocker then you’ll need to have a System Reserved partition. For Windows client OS’s then that’s a great feature to have but from a server OS perspective where BitLocker just isn’t used then it’s superfluous. The System Reserved partition is created by default on OS installation so there’s two options to remediate.

  1. Remove the partition manually post installation
  2. Remove the partition from your Windows OS templates

I won’t go into the details on how to remove the partition from your templates here but you can find more information over on mydigitallife.info which can be used. I ran through the steps myself to do this for all of our Windows templates following finding the root cause of the initial error.

As per one of the links mentioned in Comment 27 in the VMware communities post it’s possible to change the location of the boot files so that the partition can be removed. This information can be found over on geekshangout.com. However the steps didn’t include how to re-claim that partition so that there isn’t an unallocated disk partition sitting in front of the C drive (disk 0). While I haven’t tested backups in this configuration I wouldn’t be surprised if it cause other issues during backup. So below I’ve listed the steps to follow so you can successfully remove the partition as per the steps on geekshangout and then re-claim the space on gparted.

Delete System Reserved partition and reclaim space

Read More

VMware – Security vulnerability VMSA-2015-0007

VMware announced over the weekend that some major security vulnerabilities have been identified in vCenter and ESXi 5.0, 5.1 and 5.5 as well as version 6.0. 6.0 Update 1 is not affected. Only the JMX RMI Remote code execution is an issue in vSphere 6.0. 3 vulnerabilities have been identified and the affect different versions in total.

ESXi OpenSLP Remote Code Execution

  • Allows unauthenticated users to execute code remotely on ESXi host

vCenter Server JMX RMI Remote Code Execution

  • An unauthenticated remote attacker that is able to connect to the service to execute arbitrary code on the vCenter server

vCenter Server vpxd denial-of-service vulnerability

  • Can allow a remote user to create a denial of service on the vpxd service through unsanitized heartbeat messages

The announcement was broken on both the VMware and TheRegister sites and I’d recommend viewing more information on both of those sites. TheRegister also gives some great background on how the issues were originally identified. The full advisory details including links to the CVE references can be viewed on the VMware Security Advisories site for VMSA-2015-0007.

If you are running vSphere 5.0 the recommendation is to upgrade to v5.0 Update 3e. For vSphere 5.1 upgrade to v5.1 Update 3. For vSphere 6 the recommendation is to patch with Update 1. vSphere 5.5 however has some issues. In order to fix the denial-of-service or the OpenSLP issues it’s advised to upgrade to vSphere 5.5 Update 2. However, to resolve the JMX RMI issue VMware have confirmed that vSphere 5.5 Update 3 which was released in early September as being the fix. But, a new bug has been identified with Update Patch 3 regarding snapshots. If a snapshot is deleted in vCenter it causes the VM to crash. Considering that the majority of snapshot related backup solutions utilise VMware snapshots it means that all VMs would reboot each night. Considering uptime is always a business and IT priority then it’s really not a feasible solution.

My advice would be to at least upgrade to vSphere 5.5 Update 2 if you can. Upgrade to vSphere 6.0 Update 1 if possible but that may require considerable research and interoperability checks and may not be on your roadmap just yet. Do not install ESXi 5.5 Patch 3 if your backup software depends on VMware snapshots.

VMware Validated Designs

What are VMware Validated Designs?

VMware announced at VMworld earlier this year that they have been working on  implementing VMware Validated Designs. This is a fantastic step by VMware and shows a maturity that has come from years of being the number one virtualisation platform. Cisco had had validated designs for years and I refer to them regularly when deploying Cisco related infrastructure. Through the implementation of validated designs VMware is assisting the community to develop and implement consistent designs across infrastructures which will help provide a consistency and familiarity not currently present. When a new platform is being deployed the elements to consider can include compute, storage, network, security, automation and operations. These are not just reference architectures, the validated designs are constantly updated continuously.
This video gives a bit more of an explanation around what VMware Validated Designs are. The designs have been split into pods, Management, Edge and Compute. Management is made up of vCenter Server, vRealize Operations Manager, vRealize Log Insight and VMware Horizon. Network and security are provided by VMware NSX, storage is provided VSAN. The Edge pod provides additional NSX support to allow external access to compute workloads. The compute pod is the heavy lifting pod.

 

Read More