post

UCS Director Global Deployment

Last year I presented at the local Cisco DCUG to a warm and receptive audience about Cisco UCS Director being deployed on a global scale. At the time I was working for a global pharmaceutical company and following some organisational changes the requirements of the business and in turn IT changed to match. A key part of the changes focused on global standardisation of IT infrastructure to ensure 24 x 7 operational support. The best way to achieve that goal was to look at automation and orchestration. Cisco UCS Director was the tool chosen at the time. UCS Director is an absolute beast of a product and it reflects badly on Cisco as to how they have marketed and managed the product. It has potential to be the one stop shop for infrastructure management.

Concept:

Create a global platform to enable physical and virtual automation based on standardised templates and processes.

Purpose:

  • Drive standardisation across 14 global sites, reduce management overheads and complexities
  • Put the company in a position to leverage follow the sun support for infrastructure to minimise out of hours support at each local site
  • Provide a secure platform that could easily meet strict auditing guidelines
  • Deliver a mechanism to allow end-users to quickly and easily request new virtual machines
  • Streamline the request for infrastructure processes and remove existing bottlenecks
  • Drive the business towards a Private Cloud architecture rather than individual silos
  • Reduce licensing costs across the business for multiple existing automation and orchestration platforms.
  • The ability to provide a cost model and service catalog and quickly inform projects on the estimated potential costs of their projects.
  • Integration into the existing service management tool
  • Integration into HP Quality Control for auditing and quality control purposes. This allowed for installation verification scripts to be completed.

Read More

post

Fix: Cisco B200 M4 – FlexFlash – FFCH_Error_old_firmware_Running_error

During a recent upgrade of Cisco B200 M4 blades I got the following error:

FlexFlash FFCH_ERROR_OLD_FIRMWARE_RUNNING
flexflash-error

I really wasn’t sure what was causing the issue but it turned out to be a known bug for M4 blades. More details can be found over on Cisco BugSearch Note: You’ll need a Cisco Login to access the site. Basically the issue affects B200 M4 blades upgraded to 2.2(4) or higher.

The workaround is actually quite easy and just needs to have the FlexFlash Controller reset. This can be done using the below steps:

Step 1: Select Equipment -> Chassis # -> Server # -> Inventory -> Storage -> Reset FlexFlash Controller

Flexflash-fix-steps

Step 2: Click Yes to reset the FlexFlash controller

reset-flexflash-controller

Step 3: Click Ok on reset notification

flexflash-controller-ok

post

Fix: Cisco UCS B200 M4 Activation Failed

During a recent upgrade I ran into a problem with activation of B200 M4 blade. This was following the infrastructure firmware upgrade and the next step was to upgrade the server firmware. However, before upgrading the server firmware I got the error from the B200 M4 blades showing the following error:

Activation failed and Activate Status Set to Failed

This turned out to be due to the B200 M4 blades shipping with version 7.0 of the board controller firmware. On investigation with Cisco I found that it’s a known bug – CSCuu78484

You can follow the commands to change the base board. You can find more information on that from the Cisco forums but the commands you need are below:

#scope server X/Y (chassis X blade Y)

#scope boardcontroller

#show image

#activate firmware version.0 force

>Select a lower version than current one

#commit-buffer

What I found was that since I was going to be upgrading the blade firmware version anyway there was no point in dropping the server firmware back and instead proceed with the upgrade which fixed the issue.

I spoke with TAC and they advised that the error could be ignored and I could proceed with the UCS upgrade. The full details of the upgrade can be found in another post.

post

How To: Cisco UCS Firmware Upgrade 2.2 to 3.1 with Auto-Install

Recently I had to upgrade our ESXi hosts from Update 2 to Update 3 due to security patch requirements. This requirement stretches across two separate physical environments, one running IBM blades and the other running on Cisco UCS blade chassis in a Flexpod configuration. The upgrade paths for both are slightly different, and they also run on different vCenter platforms. Both of these also have different upgrade paths as one is running VMware SRM and is in linked mode. I’m not going to discuss the IBM upgrades but I did need to upgrade the firmware of the Infrastructure and Servers for Cisco UCSM.

Before you being any upgrade process I highly recommend reading the release notes to make sure that a) an upgrade path exists from your current version, b) you become aware of any known issues in the new version and c) the features you want exist in the new version

UCS Upgrade Prep Work

Check the UCS Release Guides

Check the release notes to make sure all the components and modules are supported. The release notes for UCS Manager can be found on their site. The link is listed further below in the documents section.

Some of the things to check within the release notes are:
* Resolved Caveats

ucs-caveats-precheck

  • UCS Version Upgrade patch

ucs-infra-requirements-precheck

  • UCS Infrastructure Hardware compatibility

ucs-infra-requirements-precheck1

  • Minimum software version for UCS Blade servers

ucs-server-requirements-precheck1

Open a Pre-Emptive Support Call

I opened a call with Cisco TAC to investigate the discrepancy in the firmware versions. The advice was to downgrade the B200 M4 server firmware down to 4.0 (1). However, as I was planning on upgrading anyway I’ve now confirmed that the best option is to upgrade to the planned 3.1 version. As part of this upgrade I will also upgrade all the ESXi hosts on that site the same day. There is a second UCS domain on another site that will be upgraded on another date.

ucs-pre-emptive-support-case

Read More

post

Cisco HyperFlex – Welcome to the HCI Party!

Cisco has finally decided to bring the vodka to spike the punch at the Hyper-Converged Infrastructure party. And it tastes pretty damn good. There have been rumours for a while now that Cisco was working with Springpath and as a major third round investor it’s not surprising to hear about their entrance into the HCI arena. The Register’s Chris Mellor reported about Something bubbling up at Springpath back in early December.  So what is the offspring of Cisco and Springpath called? Cisco HyperFlex!!

hyperflex systems

The Play:

Hyper-converged systems so far have delivered on simplicity and scale but there’s been a massive gap in  the lack of network integration in existing solutions. Yes you can use top of rack fast switches. In some cases customers use Cumulus on whitebox top-of-rack switches for software defined networking but networking is not a built in feature of the two leading hyper-converged solutions, Nutanix and Simplivity.

HyperFlex joins the comprehensive DC portfolio along with UCS, MDS and Nexus. It means that Cisco now has a play in traditional component based infrastructure, converge infrastructure and now hyper-converged infrastructure. Cisco is adding HyperFlex to provide it with another string to its software defined infrastructure. It will now have:

  • UCS – compute (service profiles, APIs etc.)
  • ACI – for software defined network
  • HyperFlex- software defined storage, compute and network

hyperflex systems overview

On the initial release Cisco HyperFlex will support file storage and VMware. There are a number of other storage types, such as block and object, and hypervisors on the roadmap.  There’s also going to be container support. Given that Springpath was hypervisor agnostic I’d expect a quick ramp up from Cisco and fast feature release cycle.

The Potential:

Like pretty much every other hyper-converged solution Cisco sees its expected use-cases to be:

  • VDI
  • Server virtualisation
  • Test and development
  • Large remote branch offices

UCS Manager is already familiar to multiple thousands of customers worldwide and the server and network deployment settings in HyperFlex come from pre-configured Service Profiles. Service Profiles are well and truly familiar to anyone that has worked with Cisco UCS.  Given that customer base and the familiarity with existing management tools there’s massive potential for Cisco HyperFlex here. There are some well developed existing incumbents in the hyper-converged market with Nutanix leading the way and HyperFlex will allow Cisco to gain a foothold in that rapidly growing market.

The Deep-dive: Read More

post

Cisco UCS – Unable to deploy Service Profile from Template – [FSM:Failed] Configuring service profile

I was recently deploying new blades within the UCS chassis but found that I was unable to. In one UCS domain there were no issues but in the second UCS domain if failed with the error [FSM:Failed] Configuring service profile xxx (FSM:sam:dme:LsServerConfigure) and there were a number of other minor warnings as well. The service profile would appear in the list but it was highlighted as having an issue and it could not be assigned to any blades. After a bit of searching around I found an answer on the Cisco Communities forum.

FSM failed

I also created a tech support file and downloaded it to my desktop, extracted the compressed files and opened sam_techsupportinfo in notepad. I did a search for errors and found that there was an issue resolving default identified from UCS Central.

UCS-SP-deploy-fail-1

The solution was to unregister the server from UCS Central and then to deploy the Service Profile again. To unregister from UCS Central go to Admin Tab -> Communication Management -> UCS Central and select Unregister from UCS Central. Before unregistering make sure that the Policy resolution controls are how you want them to be. In my case they were all set to local so unregistering from UCS Central had to real impact. Many users will have UCS Central integration configured to work as it was designed and will use Global policies. Unregistering from UCS Central can have a knock on impact on how those policies are managed.

UCS-SP-deploy-fail-2

fsm-fail-UCS-Central-unregister

Once the unregister had completed I ran the service profile deployment from template and it worked this time. I believe the issue is down to a time sync issue between UCS and UCS Central. I’m currently working on a permanent work around

post

HowTo: vROPS – Blue Medora Cisco UCS and NetApp Management Pack installs

Following on from installing vROPS a few month back I finally made the jump to install the Blue Medora management packs for both Cisco UCS and NetApp to get greater visibility into my virtual environment and the underlying physical infrastructure. I’m really looking forward to seeing what these management packs have to offer. While I’m not going to cover off the dashboards provided by the management packs in this post it is something I plan on revisiting once it’s been in use for a while and I’ve done a bit more playing around with it. The reason I’m posting this deployment process is that despite Blue Medora having decent installation guide it’s not always 100% clear, so I’ve done this to hopefully help guide a few others through the process a bit easier.

Cisco UCS Management Pack Deployment

Before you begin this deployment you can download trial versions from Blue Medora and if you want a permanent installation purchase some licenses from Blue Medora.

1: In vRealize Operations Manager go to the Administration -> Solutions

Blue Medora UCS Management Pack Install Step 1 Read More

post

Cisco UCS – FSM:FAILED: Ethernet traffic flow monitoring configuration error

During a recent Cisco UCS upgrade I noticed an error for ethlanflowmon which was a critical alert. I hadn’t seen the problem before and it occurred right after I had upgraded UCS Manager firmware as per the steps listed in a previous post I wrote about UCS Firmware Upgrade. Before proceeding to upgrade the Fabric Interconnects I wanted to clear all alerts where possible. The alert for “FSM:FAILED: Ethernet traffic flow monitoring configuration error on” both switches was a cause for concern.

ethlanflowmon On further investigation I found that this is a known bug when upgrading to versions 2.2(2) and above. I was upgrading from version 2.2(1d) to 2.2(3d). Despite being a critical alert the issue does not impact any services. The new UCSM software is looking for new features on the FI that do not exist yet as it has not been upgraded. As soon as you upgrade the FIs this critical alert will go. More information about the bug can be found Cisco’s support page for the bug CSCul11595

 

post

UCS Director – BareMetal Agent Installation Version 5.2, Upgrade to 5.3

UCS Director Baremetal Agent Installation:

Before commencing the Installation of the Baremetal Agent appliance I would recommend that UCS Director has been fully installed and is available before proceeding. If you need to install UCS Director as an initial installation there’s some great documentation on the Cisco site but you can also check out the blog post by Jeremy Waldrop. It’s for an older version of UCS Director but the installation steps still count for the current version. If you are upgrading from a previous version of UCS Director then you can check out a previous post I did on upgrading UCS Director from 5.1 to 5.3.

Useful Documents:

Cisco UCS Director Baremetal Agent Installation and Configuration Guide, Release 5.2

Cisco UCS Director Baremetal Agent Installation and Configuration Guide, Release 5.3

Download Software:

Go to Cisco Download for UCS Director  and select first UCS Director 5.3. Download the Cisco UCS Director Baremetal Agent Patch 5.3.0.0

UCSD Bare Metal Upgrade Download Accept the license agreement

UCSD Bare Metal Upgrade Download license agreement

The download will begin

UCSD Bare Metal Upgrade Downloaded File

Next, go back to the main UCS Director download page and select UCS Director 5.2.

UCSD Bare Metal Upgrade OVF DeploymentAccept the license agreement

UCSD Bare Metal Upgrade license agreement

The download will begin

UCSD Bare Metal Upgrade Patch Download File

Read More

Leap Second Year – Impact on Cisco Equipment

Our network engineer sent out an email last week about a potential bug due to this year being a Leap Second Year. This wasn’t something I was aware of before so I did a bit of a search for not only the impact of the bug and what exactly a Leap Second is. As it turns out due to rotational variations of the planet the atomic clock can be out of sync. When this gets to 0.9 second the International Earth Rotation and Reference System (IERS) announces that a leap second will be added to the clock.

On midnight on June 30 this year the world atomic clock will have one second added to align the atomic clock to variances in the earths rotation. This is not the first occurance of this, there’s been 26 of these additional seconds added to the atomic clock since 1972. The last of these changes was in 2012. So what’s the big deal? Well, since the vast majority of computer systems use NTP to lock in their time settings the additional second will cause the same second to occur twice and this has the potential to cause some damage or downtime due to reboots. In 2012 some high profile companies such as Qantas, LinkedIn and Yelp suffered from outages as their equipment rebooted as it wasn’t able to handle the leap second. Cisco has worked to put both software/firmware updates or workarounds in place to help their customers resolve any potential impact. You can find more information about the Leap Second over on Cisco’s site.

As soon as I read the email I began to check out which systems are affected by this problem. The focus was obviously on the Cisco equipment within our Flexpod environment. This includes Nexus 7000, Nexus 5000, UCS Manager, UCS Fabric Interconnects, Cisco MDS switches and lastly Cisco UCM sitting on the infrastructure. I’ll go through each system, the symptoms, known affected systems and known firmware fixes. For more information on each component click on the header of the section and it’ll bring you directly to the Cisco bug search site.

Cisco Nexus 7000:

When the leap second update occurs a N7K SUP1 could have the kernel hit what is known a “livelock” condition under the following circumstances:

a. When the NTP server pushes the update to the N7K NTPd client, which in turn schedules the update to
the Kernel. This push should have happened 24 hours before June 30th, by most NTP servers.
b. When the NTP server actually updates the clock

Workaround:

On switches configured for NTP and running affected code, following workaround can be used.
1) Remove NTP/PTP configuration on the switch at least two days prior to June 30, 2015 Leap second event date.
2) Add NTP/PTP configuration back on the switch after the Leap second event date(July 1, 2015)

Known Affected Releases:

5.5(1)E2, 5.5(2), 6.0(4)

Known Fixed Releases:

5.2(6.16)S0, 5.2(7), 6.1(1)S28, 6.1(1.30)S0, 6.1(1.69), 6.2(0.217), 6.2(2)

Read More