Fix: Cannot run upgrade script on host, ESXi 5.5 

During a recent upgrade I found that one of the ESXi hosts just would not update using Update Manager. The error I was seeing was “Cannot run upgrade script on host”.

After a bit of searching I found this article which related to ESXi 5.1 upgrade to 5.5 but the steps worked well to fix the issue I was seeing.

In order to fix the issue I performed the following steps:

Step 1: Disable HA for the cluster

Disable Cluster HA

Step 2: Go to vCenter Networking. Select the distributed vswitch and then select the hosts tab. From here, right-click on the host you need to reboot and select Remove from vSphere Distributed Switch

Remove Distributed Switch

Click Yes to remove the host from the switch.

Confirm vDS Removal

Step 3: Remove the host from the cluster

Remove ESXi host from cluster

Step 4: Enter the host into maintenance mode and then choose to reboot.

Enter Maintenance Mode
Step 5: Connect via SSH to the ESXi host and run the following commands to uninstall the FDM agent:

>
cp /opt/vmware/uninstallers/VMware-fdm-uninstall.sh /tmp
chmod +x /tmp/VMware-fdm-uninstall.sh
/tmp/VMware-fdm-uninstall.sh
>

SSH Host FDM Uninstaller
Step 6: Reboot the host

Reboot the host
Step 7: Add the ESXi host back to the cluster

rejoin host to cluster step 1

rejoin host to cluster step 2

rejoin host to cluster step 3

rejoin host to cluster step 4
Step 8: Re-add the host to the Distributed vSwitch. Go to Networking -> select the distributed vswitch. Right-click and select Manage Hosts.

Manage vDS

Select the host

Select Host

Select vnics for Uplinks to be managed by the switch

Manage vDS uplinks

Step 9: Turn vSphere HA back on for the cluster the host resides on.

Turn on vSphere HA

Step 10: Run the upgrade again from Update Manager and this time it will work.

post

How To: Upgrade to ESXi 5.5 Update 3b on Cisco UCS

ESXi upgrade preparation

With Cisco UCS you really need to make sure that your ESXi hosts are running the correct driver version. If you’re running NFS or FCoE storage into your ESXi hosts as either datastores or RDM disks then it’s critical that you have the right fnic and enic drivers. Even if you use the Cisco Custom image for ESXi upgrades the enic and fnic drivers may not be correct according to the compatibility matrix. I’ve had this issue in the past and I saw intermittent NFS datastores going offline for a Dev ESXi host and the resolution was to upgrade the enic driver which handles ethernet storage connectivity.

The best place to go is to VMware’s compatibility site for IO drivers which comes under the System/Servers. To find out which drivers you currently have you will need to check on the driver versions on the ESXi hosts. This can be done by following KB1027206. Using the values for the Vendor ID, Device ID, Sub-Vendor ID and Sub-Device ID it’s possible to pinpoint the interoperability with your respective hardware. In my case I have both VIC1340 and VIC1240 in the mix so I had to go through the process twice. Primarily you’ll be using the ‘ethtool -i’ command to find the driver version.

enic_driver_check_vmware_kb_steps
e.g. You can check the UCS VIC 1240 for FCoE CNAs on ESXi 5.5 Update 3 here

In this image you can see the version of enic drivers I’m running, 2.1.2.71 doesn’t match the firmware version that will be installed as part of the Cisco Custom ISO image. This shows that the enic driver version will need to be upgraded as part of the process.

enic_driver_check_vmware

Read More

post

Fix: vCenter failure to upgrade – unable to configure log browser windows service

During a recent upgrade from vCenter Server 5.5 Update 2d to vCenter Server 5.5 Update 3b it kept failing at the web client upgrade. After successfully upgrading Single-Sign On I proceeded with the upgrade of vSphere Web Client. I got the following error during the installation:

Error 29702 unable to configure log browser windows service please check vminst.log in system temporary folder for details

The update to 5.5 3b caused disk capacity to fill up and make the installation process unable to finish the upgrade. The SSO install worked but the WebClient fails with error 29702. The primary issue was that over 40GB of space on C drive was taken up with SSO upgrade. I searched for fixes and found the following link but before carrying out the task of removing the Java Components and re-installing again I wanted to check with support on the procedure.

The steps I followed to fix the issue were:

Step 1: Go to Control Panel, select VMware vCenter Server – Java Components and select uninstall

vmware java component unistall

Step 2: Click ok to confirm the uninstall

vmware java component unistall step 2

Step 3: Click Yes to confirm reboot

java component uninstall step 3

Step 4: Following the reboot you can then begin the upgrade process once again and this time it will succeeed. Run the vCenter installer and from Custom Install select vCenter Single Sign-On. Click Next.

vcenter upgrade step 1

Step 5: Click Install

vcenter upgrade step 3

Step 6: The single sign-on components will begin to install, including components such as OpenSSL

vcenter upgrade step 3

One of the key components being installed is VMware JRE.

vcenter upgrade step 4 vmware JRE

Step 7: If you get prompted to close some applications select “Close the applications and attempt to restart them”. Click Ok.

vcenter upgrade step 5

Click ok to the prompt to close apps automatically

vcenter upgrade step 6

Step 8: Click Finish to complete the Single Sign-On upgrade

vcenter upgrade step 7

Step 9: Click on vCenter Web Client to begin the next stage of the upgrade

vmware upgrade step 8

Step 10: Click Yes to continue

vmware upgrade step 9

Step 11: Click Accept License agreement and click Next

vmware license agreement

Step 12: Click Install to begin the web client installation

vsphere web client install

Step 13: Click Finish to complete the installation

vsphere web client installation completion

Once you click Finish click Ok on the dialog to advise that the services will take a few minutes to restart

vsphere web client installation completion 1

Step 14: Select vCenter Inventory Service and click Install

vcenter inventory service upgrade step 1

Step 15: Click Yes for Inventory Service install

vcenter inventory service upgrade step 2

Step 16: Click Next to continue the installation process

vcenter inventory service upgrade step 3

Step 17: Click Accept License agreement and click Next

vcenter inventory service upgrade step 4

Step 18: Click Install for inventory service

vcenter inventory service upgrade step 5

Step 19: Click Finish on completion

vcenter inventory service upgrade step 6

Step 20: Install vCenter Server

vcenter server upgrade step 1

Step 21: Click Ok to continue

vcenter server upgrade step 2

Step 22: Click Next to continue

vcenter server upgrade step 3

Step 23: Click to accept the license and click Next

vcenter server upgrade step 4

Step 24: Enter the database user login credentials, VC_User

vcenter server upgrade step 5

Step 25: Click Install at the Customer Experience Improvement Program

vcenter server upgrade step 6

Step 26: Click Finish to complete the installation

vcenter inventory service upgrade step 6

post

Fix: Cisco B200 M4 – FlexFlash – FFCH_Error_old_firmware_Running_error

During a recent upgrade of Cisco B200 M4 blades I got the following error:

FlexFlash FFCH_ERROR_OLD_FIRMWARE_RUNNING
flexflash-error

I really wasn’t sure what was causing the issue but it turned out to be a known bug for M4 blades. More details can be found over on Cisco BugSearch Note: You’ll need a Cisco Login to access the site. Basically the issue affects B200 M4 blades upgraded to 2.2(4) or higher.

The workaround is actually quite easy and just needs to have the FlexFlash Controller reset. This can be done using the below steps:

Step 1: Select Equipment -> Chassis # -> Server # -> Inventory -> Storage -> Reset FlexFlash Controller

Flexflash-fix-steps

Step 2: Click Yes to reset the FlexFlash controller

reset-flexflash-controller

Step 3: Click Ok on reset notification

flexflash-controller-ok

post

Fix: Cisco UCS B200 M4 Activation Failed

During a recent upgrade I ran into a problem with activation of B200 M4 blade. This was following the infrastructure firmware upgrade and the next step was to upgrade the server firmware. However, before upgrading the server firmware I got the error from the B200 M4 blades showing the following error:

Activation failed and Activate Status Set to Failed

This turned out to be due to the B200 M4 blades shipping with version 7.0 of the board controller firmware. On investigation with Cisco I found that it’s a known bug – CSCuu78484

You can follow the commands to change the base board. You can find more information on that from the Cisco forums but the commands you need are below:

#scope server X/Y (chassis X blade Y)

#scope boardcontroller

#show image

#activate firmware version.0 force

>Select a lower version than current one

#commit-buffer

What I found was that since I was going to be upgrading the blade firmware version anyway there was no point in dropping the server firmware back and instead proceed with the upgrade which fixed the issue.

I spoke with TAC and they advised that the error could be ignored and I could proceed with the UCS upgrade. The full details of the upgrade can be found in another post.

post

How To: Cisco UCS Firmware Upgrade 2.2 to 3.1 with Auto-Install

Recently I had to upgrade our ESXi hosts from Update 2 to Update 3 due to security patch requirements. This requirement stretches across two separate physical environments, one running IBM blades and the other running on Cisco UCS blade chassis in a Flexpod configuration. The upgrade paths for both are slightly different, and they also run on different vCenter platforms. Both of these also have different upgrade paths as one is running VMware SRM and is in linked mode. I’m not going to discuss the IBM upgrades but I did need to upgrade the firmware of the Infrastructure and Servers for Cisco UCSM.

Before you being any upgrade process I highly recommend reading the release notes to make sure that a) an upgrade path exists from your current version, b) you become aware of any known issues in the new version and c) the features you want exist in the new version

UCS Upgrade Prep Work

Check the UCS Release Guides

Check the release notes to make sure all the components and modules are supported. The release notes for UCS Manager can be found on their site. The link is listed further below in the documents section.

Some of the things to check within the release notes are:
* Resolved Caveats

ucs-caveats-precheck

  • UCS Version Upgrade patch

ucs-infra-requirements-precheck

  • UCS Infrastructure Hardware compatibility

ucs-infra-requirements-precheck1

  • Minimum software version for UCS Blade servers

ucs-server-requirements-precheck1

Open a Pre-Emptive Support Call

I opened a call with Cisco TAC to investigate the discrepancy in the firmware versions. The advice was to downgrade the B200 M4 server firmware down to 4.0 (1). However, as I was planning on upgrading anyway I’ve now confirmed that the best option is to upgrade to the planned 3.1 version. As part of this upgrade I will also upgrade all the ESXi hosts on that site the same day. There is a second UCS domain on another site that will be upgraded on another date.

ucs-pre-emptive-support-case

Read More

post

Cisco Data Center User Group Melbourne First Meetup

Last night we hosted the first Cisco Data Center User Group in Melbourne. It was a successful night with a great turn out and excellent interaction and networking between everyone that attended. Everyone was enthusiastic and willing to take part and really mate it a fantastic night.

The user group was formed with the intention to create a space where IT professionals can come together in a relaxed environment to network, have a drink and learn about data center technology. We wanted to have an interactive and social atmosphere and thanks to everyone that attended and took part because that’s exactly what was achieved.

Cisco DCUG Melbourne Members photo

One of the things that I liked most about the meetup was the attendance of people from other community groups. Craig Waters (@cswaters1) from the VMware VMUG community, Brett Johnson (@brettjohnson008) from the vBrownBag community and one of the presenters, Will Robinson, from the NetAppATeam. The support from other communities is great and we really appreciate it.

The night itself began with an introduction from Derek Hennessy (@derekhennessy) and Chris Partsenidis (@cpartsenidis) on how the user group idea was formed. A shout out went to Lauren Friedman (@lauren) from Cisco for her help and support for getting the user group off the ground. We swiftly moved onto the first speaker of the night, Chris Gascoigne (@chrisgascoigne).

Introduction

Chris is a Technical Architect for Cisco ANZ with the Data Center team and has a focus on ACI, Nexus 9000, Automation/Orchestration and DevOps. Chris ran through a few slides on how network engineers can leverage tools such as Puppet, Ansible and Chef to implement the DevOps framework. He then ran through a demo of how to manage a Nexus 9000 switch from a bash shell and deploy Puppet configurations to a switch. Chris also emphasised the need to provide version control, code review and deployment into production. There were a number of questions from the audience as everyone tried to imagine using such tools within their own infrastructure environments. Unfortunately I don’t have a copy of Chris’ slidedeck to make available. A special mention goes out to Chris Partsenidis for performing the important task of being a microphone stand through Chris Gascoigne’s demo.

Following Chris’ presentation we took a break and let everyone digest the content and the food as well as order up another drink for the next session. Will Robinson (@oznetnerd) is a Senior Engineer with a focus on networking and storage and a wealth of experience. Will also has a mighty home lab setup and he gave everyone a run through on using GNS3 within his home lab. He really hit home on rethinking the physical and the logical implementations of networks and gave an example of a complex network he’d designed within GNS3. Everyone was really engaged in Wills presentation and it was like a quick fire buzzer round at a quiz following his presentation. He even managed to jokingly make reference to a layer 8 issue for someone using GNS3

GNS3 Connectivity

I’ve uploaded the slidedecks from the night and in the future we hope to capture the presentations on video and make them available as an archive following the events themselves. All in all it was a great night and we believe we have now started to develop a new community. If you’re interested in learning about technology, having a drink and some grub, and meeting other IT professionals and networking then we’re really looking forward to seeing you at the next meeting on Tuesday July 5th.

P.S. Thanks to Chris for the photo of the attendees

post

Cisco Data Centre User Group – Melbourne

cisco next gen data center user group

Next month we’re starting a new user group for Cisco Data Center in Melbourne. This user group is being run by Cisco Champions, myself and Chris Partsenidis. Chris and I met up after the recent Cisco Live in Melbourne and got chatting about how there’s no real community around Cisco technology so we reached out to Lauren Friedman (@lauren). Lauren was super helpful and has supported the creation of Cisco Data Center User Group. This is something that Lauren is working on from a global perspective and we’re delighted to be laying the groundwork in Australia.

This user group is centered around Cisco Next-Generation Data Centers and is for anyone that uses Cisco technology or that of the extended ecosystem. Our meetup is a fantastic opportunity to get to know others in the community over some snacks and beers in a relaxed and social environment. While the group is supported by Cisco, don’t expect sales pitches. We’ll focus on enabling a local community for Cisco Data Center users to share experiences, network and to learn more about both technology and careers. We openly invite submissions for topics and presentations from any members.

Some of the topics we’re looking to cover in the coming months are:

  • Cisco HyperFlex
  • DevOps
  • Cisco Nexus Switching
  • Big Data Analytics
  • Data Center Storage
  • CCNA DC and beyond
  • Cisco ACI and Nexus 9000
  • Operations and Data Center Management
  • HomeLab setup
  • Exam Preparation and certification
  • Automation and Orchestration
  • We’re open to requests from the community for topics of interest

The user group will catch up on the first Tuesday of every month at The Crafty Squire at 127 Russell Street in Melbourne CBD. We’ll be located upstairs in Porter Place. Our first meeting will run be Tuesday June 7th and all meetings will take place between 5:30 and 7:30PM.

Crafty Squire Porter Place

More details about the regular meet ups can be found over at Cisco Data Center User Group page on Meetup.com. This page will be updated regularly with the meeting agendas and speakers. We look forward to seeing you there, please don’t be shy and come along to say hello. Welcome to the community.

post

vMotion to vNotions

vNotions Logo LargeThose that frequent the site regularly will have noticed quite a few changes recently. I’ve migrated the blog from wordpress.com to a hosted wordpress site and the name has also changed from virtualnotions to vNotions. I wanted to get more control of the site and be able to develop it over time into something else as it continues to grow and develop. WordPress.com is excellent as a free resource but I wanted to be able to customise more.

I really wasn’t sure what the best hosting solution would be as there are a number of options. There’s managed, managed hosted, virtual private server (VPS) and also the option of running wordpress in AWS. I turned to twitter to see if anyone had any recommendations for hosting wordpress. The first reply came from Mike Andrews (@trekintech) and I have to thank him for the recommendation. I had a look at a number of different providers and settled on DigitalOcean which was put forward by Mike. DigitalOcean have a strong community forum and supporting documentation so it was very easy to get everything set up. Each VPS in DigitalOcean is called a droplet and it’s very quick to deploy a new server instance. I stumbled across ServerPilot.io which allows quick deployment of apps on DigitalOcean VPS instances. ServerPilot takes a lot of hassle with setting up new apps and given that it’s also got a free option it’s very appealing. It also deployed WordPress using the Nginx engine so it’s considerably faster than just the LAMP stack with Apache. For quick reference check out this guide for installing wordpress on ubuntu and also this one one installing wordpress on DigitalOcean. There’s also a good guide on setting up wordpress on DigitalOcean over at MyBloggingThing. It was a straightforward process to set up a new instance of wordpress and migrate the content from the old wordpress.com site to the new vNotions.com site. Once the site was migrated and fully operational I enabled CDN using CloudFlare to improve speed accessing the site from disperse graphical locations. All in all, it was a relatively painless process.

Right now I’m tidying up the posts on the site to clear out any old posts that are no longer relevant. I’d like to thank Mike Andrews for his feedback that set the ball rolling. For anyone thinking of checking out DigitalOcean I’d definitely recommend jumping right in. The support team at DigitalOcean were also top class and replied very quickly to an issue I had (self-inflicted I might add). vNotions has vMotioned from VirtualNotions.

post

Cisco Live Session Review

I gave a recap of Cisco Live Melbourne in another post and had intended on providing a detailed look at each of the sessions I attended as part of that post but it became a bit long-winded so I’ve broken it out into separate posts. I’ve broken the sessions down by each day.

cisco_live_mel_image

Day 1:

TECCOM-2001 –  Cisco Unified Computing System

As someone that is working towards CCNA and CCNP in Cisco Data Center this extra technical seminar really was invaluable and opened my eyes up to a lot of areas that were unknown to me. This breakout session was an 8-hour, full-on overview of Cisco UCS, the components that comprise the solution and how it all works together. It wasn’t a deep-dive session however so if you’ve a really good working knowledge of UCS and know what’s under the covers quite well then this session wouldn’t really be for you. In saying that however I think there’s always opportunities to learn something new.

Cisco-UCS-b-series-overview

The session was broken down into 6 parts.

  • UCS Overview
  • Networking
  • Storage Best Practices
  • UCS Operational Best Practices
  • UCS does Security Admin
  • UCS Performance Manager

Some of the main takeaways from the session were around the recent Gen 3 releases for the UCS hardware including the Fabric Interconnects and IOMs. They also discussed the new features for UCS Manager 3.1 code base release.  Some of the new features of UCSM and the hardware are listed below:

UCS Manager 3.1

  • Single code base (covers UCS mini, M-Series and UCS traditional)
  • HTML 5 GUI
  • End-to-end 40GbE and 16Gb FC with 3rd Gen FI’s
  • M series cartridges with Intel Xeon E3 v4 Processors
  • UCS mini support for Second Chassis
  • New nVidia M6 and M60 GPUs
  • New PCIe Base Storage Accelerators

UCS Management Portfolio

Next Gen Fabric Interconnects:

FI6332:

  • 32 x 40GbE QSFP+
  • 2.56Tbps switching performance
  • IRU & 4 Fans

FI6332-16UP:

  • 24x40GbE QSFP+ & 16xUP Ports (1/10GbE or 4/8/16Gb FC)
  • 2.43Tbps switching performance

IOM 2304:

  • 8 x 40GbE server links & 4 x 40GbE QSFP+ uplinks
  • 960Gbps switching performance
  • Modular IOM for UCS 5108

Two other notes from this section of the technical session were that the FI6300s requires UCS Manager 3.1(1) and the M-Series is not support on the FI6300’s yet. There was also an overview of the UCS Mini upgrades, the Cloud Scale and Composable Infrastructure (Cisco C3260) and the M-Series. I’ve not had any experience or knowledge of the M-Series modular systems before and I need to do far more reading to understand this much better.

The second part of the session covered MAC pinning and the differences between the IOMs and Mezz cards. (For those that don’t know the IOMs are pass-through and the Mezz are PCIe cards). Once aspect they covered which I hadn’t heard about before was around UDLD (Uni-Directional Link Detection) which monitors the physical connectivity of cables. UDLD is point-to-point and uses echoing from FIs out to neighbouring switches to check availability. It’s complementary to Spanning Tree and is also faster at link detection. UDLD can be set in two modes, default and aggressive. In Default mode UDLD will notify and let spanning tree manage pulling the link down and in Aggressive mode UDLD will bring down link.

The Storage Best Practices looked at the two modes that FIs can be configured to and also the capabilities of both settings. If you’re familiar with UCS then there’s a fair change you’ll know this already. The focus was on FC protocol access via the FIs and how the switching mode changes how the FIs handle traffic.

FC End-Host Mode (NPV mode):

  • Switch sees FI as server with loads of HBAs attached
  • Connects FI to northbound NPIV enabled FC switch (Cisco/Brocade)
  • FCIDs distributed from northbound switch
  • DomainIDs, FC switching, FC zoning responsibilities are on northbound switch

FC Switching Mode:

  • Connects to Northbound FC switch and normal FC switch (Cisco Only)
  • DomainIDs, FC Switching, FCNS handled locally
  • UCS Direct connect storage enabled
  • UCS local zoning feature possible

The session also touched on the storage heavy C3260 can be connect to FIs as an appliance port. It’s also possible via UCSM to create LUN policies for external/local storage access. This can be used to carve up the storage pool of the C3260 into usable storage. Once thing I didn’t know what that a LUN needs to have an ID of 0 or 1 in order for boot from SAN to work. It just won’t work otherwise. Top tip right there. During the storage section there was some talk about Cisco’s new HyperFlex platform but most of the details were being withheld until the breakout session on Hyper-Converged Infrastructure later in the week.

The UCS Operational Best Practice session covered off primarily how UCS objects are structured and how they play a part in pools and and policies. For those already familiar with UCS there was nothing new to understand here. However, one small tidbit I walked away with was around pool exhaustion and how UCS recursively looks up to parent organisation until root and even up to the global level if UCS central is deployed or linked. One other note I took about sub-organisations were that they can go to a maximum of 5 levels deep. Most of the valuable information from this session was around the enhancements in latest version of UCSM updates. These were broken down into improvements in firmware upgrade procedures, maintenance policies and monitoring. Most of these enhancements are listed here:

Firmware upgrade improvements:

  • Baseline policy for upgrade checks – it checks everything is OK after upgrade
  • Fabric evacuation – can be used to test fabric fail-over
  • Server firmware auto-sync
  • Fault suppression (great for upgrades)
  • Fabric High Availability checks
  • Automatic UCSM Backup during AutoInstall

Maintenance:

  • On Next boot policy added
  • Per Fabric Chassis acknowledge
  • Reset IOM to Fabric default
  • UCSM adapter redundant groups
  • Smart call home enhancements

Monitoring:

  • UCS Health Monitoring
  • I2C statistics and improvements
  • UCSM policy to monitor – FI/IOM
  • Locator LED for disks
  • DIMM backlisting and error reporting (this is a great feature and will help immensely with troubleshooting)

Fabric evacuation can be used to test fabric fail-over before firmware upgrade to ensure bonding of NICs works correctly and ESXi hosts fail-over correctly to second vNIC. There’s  also a new tab for health also beside the FSM tab in UCSM.

The last two sections of the session I have to admit were not really for me. I don’t know whether it was just because it was late in the day, my mind was elsewhere or that I was just generally tired but I couldn’t focus. The sections on Security within UCSM and UCS Performance Manager may well have been interesting on another day but they just didn’t do anything for me. The information was somewhat basic and I really felt that UCS Performance Manager was really more of a technical sales pitch. I feel the session would have been better served with looking at more high-level over-arching tools for management such as UCS Director rather than a monitoring tool which the vast majority of people are not going to use anyway.

Overall though this entire technical session was a great learning experience. The presenters were very approachable and I took the opportunity to quiz Chris Dunk in particular about the HyperFlex solution. While I may not attend another UCS technical session again in the future I would definitely consider stumping up the extra cash needed for other technical session which may be more relevant to me then. There’s a lot of options available.

After the sessions were completed I headed down to the World of Solutions opening and wandered around for a bit. As I entered I was offered an array of free drink. Under other circumstances I would have jumped at the chance but I’m currently on a 1-year alcohol sabbatical so I instead floated around the food stand that had the fresh oysters. The World of Solutions was pumping. I didn’t really get into any deep conversations but I did take note of which vendors were present and who I wanted to interrogate more later in the week. I left well before the end of the reception so I could get home early. The next day was planned to be a big day anyway.

 

Read More