post

The Life of NetApp – Bring out your dead!

There’s a quality scene in Monty Python’s Life of Brian where the dead are being called out to be loaded onto a cart to be taken away. Are new players in the market doing the same to NetApp? Even though they continue to say that they’re not dead everyone is writing them off and chucking them on the death-cart.

It’s easy to see why NetApp is being called to bring out its dead. There’s more and more players appearing in the storage market with serious differentiators to NetApp. Just look at the list of potential competitors like Pure Storage, Tintri, SimplivityNutanix and Nimble. And that’s not including the fully software defined storage groups such as Maxta, Stratoscale and a host of others. There’s also the old adversary EMC. All of these vendors have released new and innovative products in the past year and they have managed their marketing message far better than NetApp has. NetApp has been painfully slow at getting a smooth transition in place for its 7-Mode customers to Cluster Data OnTap (C-Dot). A lot of critics of NetApp also point to the fact that they are so heavily reliant on the OnTap software. I personally don’t see an issue with that reliance. Don’t change something just to create a new release for the sake of it. But the marketing message and the perception by the community of NetApp has caused a number of issues for them.

Read More

post

UCS Director – Schedule Database Backup script

I had a problem a while ago where UCS Director crashed during a Metrocluster failover test. It was caused by the delay in the transfer of writable disks on the storage which in turn caused the VM kernel to panic and set the disk to read only. After that problem, and due to other restore issues within the infrastructure as well as not having a backup prior to the failover test I was left with a dead UCS Director appliance. It was essentially completely buggered as the Postgres database had become corrupt. Cisco support were unable to resolve the problem and it took a lot of playing around with NetApp snapshots to pull back a somewhat clean copy of the appliance from before the failover test. Really messy and I wouldn’t recommend it.

Since then I’ve been capturing weekly backups of the UCS Director database to a FTP server so I have a copy of the DB to restore should there be any problems with the appliance again. This script is not supported by Cisco so please be aware of that before implementing it. To set up the backup create a DB_BACKUP file in /usr/local/etc with the following:

#!/bin/sh
# server login password localfile remote-dir
upload_script(){
 echo "verbose"
 echo "open $1"
 sleep 2
 echo "user $2 $3"
 sleep 3
 shift 3
 echo "bin"
 echo $*
 sleep 10
 echo quit
}
 
doftpput(){
 upload_script $1 $2 $3 put $4 $5 | /usr/bin/ftp -i -n -p
}
 
/opt/infra/stopInfraAll.sh
/opt/infra/dbBackupRestore.sh backup
BKFILE=/tmp/database_backup.tar.gz
if [ ! -f $BKFILE ]
then
echo "Backup failed. "
return 1
fi
export NEWFILE="cuic_backup_`date '+%m-%d-%Y-%H-%M-%S'`.tar.gz"
export FTPSERVER=xxx.xxx.xxx.xxx
export FTPLOGIN=< ftp user name >
export FTPPASS=<ftp password>
doftpput $FTPSERVER $FTPLOGIN $FTPPASS $BKFILE $NEWFILE
nohup /opt/infra/startInfraAll.sh &
 
exit 0

Next you’ll need to edit your cron jobs on the appliance. You can use the crontab -e  command to edit the schedule settings and enter:

1 2 * * 0 /usr/local/etc/DB_BACKUP > /dev/null 2>&1

 

And there you go, you now have a weekly scheduled backup of your UCS Director database.

 DB backup pathc

post

UCS Director – Upgrade Version 5.1 to 5.3

Cisco have recently release a new version of their orchestration product UCS Director. The new release is version 5.3 and includes a raft of new features of which the majority are around improved reports and APIC support. Another new feature update is the support for NetApp OnTap 8.3. My primary reason for performing the upgrade is to leverage the reports and enhancements to workflow execution. It’s also been almost a year since the 5.1 installation was performed and I want to keep my systems up to date as much as possible. I’m currently running UCS Director 5.1.0 and Baremetal Agent 5.0.

Some of the new features in UCSD 5.3 are:

  • Support for C880 M4 Server
  • Support for Versa Stack and IBM Storwize
  • Enhancements to EMC RecoverPoint
  • Enhancements to VMware vSphere Support (VSAN Support)
  • Enhancements to Application Controllers (Cisco APIC)
  • Enhancements to workflow execution
  • Enhancements to the script module
  • Enhancements to UCSD REST APIs
  • Enhancements to Managing NetApp Accounts (including support for OnTap 8.3)
  • Enhancements to Cost Models and Chargeback features
  • Changes to Report APIs

You can find more about the features in the release over on the Cisco UCS Director 5.3 Release Notes site.

There are two components to the release, UCS Director itself and the Baremetal Agent upgrade. The supported upgrade paths for both components are:

Cisco UCS Director

Current Release Direct Upgrade Supported Upgrade Path
Release 4.0.x.x No 4.0 > 4.1 > 5.1 > 5.3
Release 4.1.x.x No 4.1 > 5.1 > 5.3
Release 5.0.x.x No 5.0 > 5.1 or 5.2 > 5.3
Release 5.1.x.x Yes 5.1 > 5.3
Release 5.2.x.x Yes 5.2 > 5.3

Read More

Leap Second Year – Impact on Cisco Equipment

Our network engineer sent out an email last week about a potential bug due to this year being a Leap Second Year. This wasn’t something I was aware of before so I did a bit of a search for not only the impact of the bug and what exactly a Leap Second is. As it turns out due to rotational variations of the planet the atomic clock can be out of sync. When this gets to 0.9 second the International Earth Rotation and Reference System (IERS) announces that a leap second will be added to the clock.

On midnight on June 30 this year the world atomic clock will have one second added to align the atomic clock to variances in the earths rotation. This is not the first occurance of this, there’s been 26 of these additional seconds added to the atomic clock since 1972. The last of these changes was in 2012. So what’s the big deal? Well, since the vast majority of computer systems use NTP to lock in their time settings the additional second will cause the same second to occur twice and this has the potential to cause some damage or downtime due to reboots. In 2012 some high profile companies such as Qantas, LinkedIn and Yelp suffered from outages as their equipment rebooted as it wasn’t able to handle the leap second. Cisco has worked to put both software/firmware updates or workarounds in place to help their customers resolve any potential impact. You can find more information about the Leap Second over on Cisco’s site.

As soon as I read the email I began to check out which systems are affected by this problem. The focus was obviously on the Cisco equipment within our Flexpod environment. This includes Nexus 7000, Nexus 5000, UCS Manager, UCS Fabric Interconnects, Cisco MDS switches and lastly Cisco UCM sitting on the infrastructure. I’ll go through each system, the symptoms, known affected systems and known firmware fixes. For more information on each component click on the header of the section and it’ll bring you directly to the Cisco bug search site.

Cisco Nexus 7000:

When the leap second update occurs a N7K SUP1 could have the kernel hit what is known a “livelock” condition under the following circumstances:

a. When the NTP server pushes the update to the N7K NTPd client, which in turn schedules the update to
the Kernel. This push should have happened 24 hours before June 30th, by most NTP servers.
b. When the NTP server actually updates the clock

Workaround:

On switches configured for NTP and running affected code, following workaround can be used.
1) Remove NTP/PTP configuration on the switch at least two days prior to June 30, 2015 Leap second event date.
2) Add NTP/PTP configuration back on the switch after the Leap second event date(July 1, 2015)

Known Affected Releases:

5.5(1)E2, 5.5(2), 6.0(4)

Known Fixed Releases:

5.2(6.16)S0, 5.2(7), 6.1(1)S28, 6.1(1.30)S0, 6.1(1.69), 6.2(0.217), 6.2(2)

Read More

post

UCS Central Upgrade Version 1.3 & Overview

While off on annual leave recently I had a few minutes to spare to look through twitter and came across a tweet from Adam J Bergh (@ajbergh) about a remote code execution vulnerability in Cisco UCS Central. You can read more about the threat over on threatpost.com but the synopsis is that “an exploit could allow the attacker to execute arbitrary commands on the underlying operating system with the privileges of the root user”. UCS Central version 1.2 and earlier are affected by this so it’s time to upgrade. Particularly since the vulnerability score is at the highest severity of 10. So before I go on I want to thank Adam for his tweet and highlighting the issue in the first place.

Pre-Requisites:

There are different steps to perform during the upgrade depending of whether UCS Central is in standalone mode or is part of a cluster. You can find more information about both methods over on the UCS Central Install and Upgrade Guide. Some of the key things to keep in mind are the supported upgrade paths and the pre-requisites before beginning the upgrade.

Important:

  • UCS Central 1.3 requires a minimum of 12Gb RAM and 40GB storage space (otherwise the upgrade will fail)
  • Use the ISO image for an upgrade to UCS Central
  • After the upgrade clear the browser cache before logging into the Cisco UCS Central GUI
  • Make sure UCS Manager is 2.1(2) or newer
  • Make sure to take a full state backup before starting the Upgrade Process

Upgrade Paths:

  • From 1.1(2a) to 1.3(1a)
  • From 1.2 to 1.3(1a)

Note: I’m running version 1.1(2a)

New Features:

Some of the new features in version 1.3 include:

  • HTML5 UI: New task based HTML5 user interface.
  • KVM Hypervisor Support: Ability to install Cisco UCS Central in KVM Hypervisor
  • Scheduled backup: Ability to schedule domain backup time. Provides you flexibility to schedule different backup times for different domain groups.
  • Domain specific ID pools: The domain specific ID pools are now available to global service profiles.
  • NFS shared storage: Support for NFS instead of RDM for the shared storage is required for Cisco UCS Central cluster installation for high availability.
  • vLAN consumption for Local Service Profiles: Ability to push vLANs to the UCS Manager instance through Cisco UCS Central CLI only without having to deploy a service profile that pulls the vLANs.
  • Support for Cisco M-Series Servers.
  • Connecting to SQL server that uses dynamic port.
  • Support for SQL 2014 database and Oracle 12c Database.

I’m really looking forward to seeing what the new HTML 5 UI is like. The initial screenshots I’ve seen are awesome. There’s a nice little introduction from Cisco over on their support site. Also, Jacob Van Ewyk has written a really informative article over on Cisco Communities with details about the UCS Central User Interface Reworked with UCS Central 1.3.

Upgrade Steps:

Read More

post

How To: Cisco UCS Firmware Upgrade

Edit: 22-Jan-2106

A recent comment highlighted that i was missing a step during Step 8: UPGRADE THE FABRIC INTERCONNECTS AND I/O MODULES. This was to manually change the primary status for the Fabric Interconnects to be the recently upgraded Fabric Interconnect. This way your environment remains accessible while the second FI is being upgraded. I’ve updated this step now to include the manual failover steps on the FI. It is a step I always follow when performing upgrades but I can’t think of why I didn’t originally put it into the post.


Recently I was tasked with performing an upgrade to our UCS environment to support some new B200 M4 blades. The current firmware version only supported the B200 M3 blades. As part of the process I performed the below steps to complete the upgrade. I split the upgrade into a planning phase and an implementation/upgrade phase. Upgrading firmware of any system can lead to potential outages, things have definitely improved on this over the past decade, but it’s imperative that you plan correctly to make the implementation process go without any hitches. With UCS firmware there is a specific order to follow during the upgrade process. The order to follow is:

  1. Upgrade UCS Manager
  2. Upgrade Fabric Interconnect B (subordinate) & Fabric B I/O modules
  3. Upgrade Fabric Interconnect A (primary) & Fabric A I/O modules
  4. Upgrade Blade firmware by placing each ESXi Host

During the upgrade process, and particularly during the Fabric Interconnect and I/O module upgrades you will see a swarm of alerts coming from UCSM. This is expected as some of the links will be unavailable as part of the upgrade process so wait until all the steps have been completed before raising an issue with support. As a caveat however, if there is anything that really stands out as not being right open a call with support and get it fixed before proceeding to the next step. During my upgrade process some of the blades showed critical alerts that did not clear and the blades needed to be decommissioned and re-acknowledged to resolve it. This problem stood out and wasn’t hard to recognise that it was more serious that the other alerts.

PLANNING PHASE:

You will need to check the release requirements from Cisco regarding the upgrade path from your current firmware version to the desired version and also the capabilities that the desired firmware version contains. The upgrade process takes some time so it’s best to review everything in advance and not have to do this on the day of the upgrade itself. For instance during this upgrade process I needed to ensure capability to support B200 M4 blades and VIC 1340 modules as they had been purchased as part of an order for a new project. The steps to carry out in the planning phase are:

Step 1: Verify the Upgrade requirements

Check the release version notes on Cisco’s website for the version you want to upgrade to. For this example we are upgrading to version 2.2

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/release/notes/ucs_2_2_rn.html

Read More

Fix: Cisco UCSM – Default Keyring’s certificate is invalid

Recently during some prep work for a UCS firmware upgrade I noticed that there was a major alert showing for keyring certificate being invalid. At first I was a bit concerned but since it didn’t affect my login to UCS Manager I assumed it wasn’t too serious. After a bit of searching around the internet I found from Cisco’s site the CLI Configuration Guide for UCS (page 6) which shows the quick and easy fix to the problem.

planning-fault check major

Open an SSH session to the IP address/hostname of UCS Manager. It will connect to the primary Fabric Interconnect. Enter the commands in the order of steps below:

Step 1 UCS-A# scope security

Step 2 UCS-A /security # scope keyring default

Step 3 UCS-A /security/keyring # set regenerate yes

Step 4 UCS-A /security/keyring # commit-buffer

After the commit-buffer command has been issues all GUI sessions will be disconnected and you will need to log in again. When you log in next time you’ll be prompted to accept the new certificate. Once accepted UCSM will open and the alert will now be gone.

All fairly quick and painless!

post

Podcast review – Cisco Champion Radio

Cisco Champions Radio is a spin-off from Cisco community recognition program that get members from the community and SME’s together to discuss Cisco related products/releases, tools and educational services. It covers the entire gamut of Cisco products from its core routing and switching to the data center to unified communications. The technology and product podcasts are interspersed with community related information. One of particular note that was very interesting was S2 E7 – Creating videos & podcasts as IT professionals. This podcast has a number of Cisco community members that either run, edit or produce their own podcasts or video content and go into a discussion around the best tools for the job. The main comment from everyone involved was for others to just make content, get out there and begin to make content. Yes it may not be super quality at the beginning but you will improve over time. The other primary comment from everyone was to make sure that you get good audio. Even for video content the visuals can be jumpy or glitchy but if the audio drops out so will your listeners/viewers.

And that leads me to my one complaint about Cisco Champion Radio. The audio. The discussion is carried out over Talkshoe and it just sounds really tinny. There are times where it sounds a bit fuller but generally it’s tinny. I know that the producers are looking at other options and it was even part of the episode 7 discussion so hopefully in the future another platform can be utilised. It’s not the first community podcast I’ve heard run across Talkshoe which has had a similar audio quality. The APAC Virtualization podcast also suffered a similar fate. Which just goes to show how compelling the content for both podcasts is if you can sit through tinny audio.

So what are Cisco Champions? Cisco Champions are the equivalent to VMware vExperts. They are community members that discuss, review, promote and critique Cisco’s product range. I think their own description is far better at explaining it than i can:

Cisco Champions are a network of people who are passionate about Cisco and enjoy sharing their knowledge, expertise, and thoughts across the social web and with Cisco. The Cisco Champions program encompasses different areas of interest and geographical areas within the company, providing a variety of opportunities for Champions to participate in the program. To learn more about the Cisco Champions program go to https://communities.cisco.com/groups/cisco-champions

The format of the podcast is a questions and answers style podcast with one leader from the Cisco Champions program that mediates the conversation and keeps things moving. There are usually 2 or 3 guests on the show, one being an SME from Cisco in a particular product/area and the others being Cisco Champions that work for other vendors or service providers. The content of the episodes range from deep technical to high-level overviews. All the guests are really clued in and have a deep understanding on the subject being discussed and provide some real food for thought. I’d recommend listening to S2 Episode 4 about VIRL for a prime example of this. It was a really in-depth discussion and really made me want to go and get my hands on VIRL. I was thinking about it before but now it’s definitely on the cards. One of the advantages of Cisco Champion Radio is that it’s not a closed podcast, it’s possible for people to join in live and listen into the discussion and ask questions to the guest. This expands the scope to the podcast to really engage with the community in real-time. Most other podcasts can only do that after the podcast has been produced. If you want to attend any of the Cisco Champion Radio episodes live you can access the details on how to attend from https://communities.cisco.com/docs/DOC-56977. This is the advantage the Talkshoe does provide over some other platforms. I’m fairly new to Cisco equipment, especially considering other engineers out there, so pretty much all the content is of interest to me as I try to expand my knowledge beyond just data center technologies and into the networking sphere. Thankfully every possible product/solution is covered on Cisco Champion Radio so I’m sure if something hasn’t come up before now it won’t be long before it is covered by the team.

If you want to access any of the back-catalog of Cisco Champion Radio episodes head on over to https://communities.cisco.com/docs/DOC-51556 and select a link or just add Cisco Champion Radio to your iTunes podcast list. I’d highly recommend it.

 

 

post

Cisco ACI

Earlier this week I was fortunate enough to be invited to an ACI Test Drive run by Firefly on behalf of Cisco. Recently I attended the Cisco Roadshow in Melbourne and was really interested in the speech by Dave Robbins around ACI. I’ve read quite a bit about ACI recently but wasn’t able to really picture in my head what is is, how it works and what benefits if any it can provide for me. Cisco have been really pushing ACI hard on the various media streams lately. There’s has also been quite a bit of discussion around the competition between Cisco with it’s ACI fabric and VMware’s NSX network virtualisation software. I’ve heard about NSX but haven’t had a chance as yet to play about with it. When the opportunity arose to join a test drive workshop on ACI it was too good to miss so I jumped at the chance. My background is not in networking but in virtualisation, compute and storage so I thought it would be a good opportunity to brush up on my networking skills at the same time. It’s definitely my weakest area so I’ve made a commitment to myself to work on my networking knowledge and understanding as much as I can.

Cisco ACI Leaf-Spine architecture

What is ACI?

ACI is the new vision for Cisco to manage their data center networks into the future. ACI is an application centric, software/policy driven, leaf-spine architecture that abstracts the logical definition of the physical hardware to provide re-usable and extensible policies for quick deployment of network infrastructure. ACI extends the principles of the Cisco UCS service profiles to the entire network fabric. The Nexus 9000 series released by Cisco earlier this year is at the core of the new ACI platform. There are a number of 9000 series switches in the family which I’ll go into more detail soon, but one item of note in all of the Nexus 9000 switches is that they can be run in standalone mode with NX-OS installed or in ACI mode (with two exceptions) which involves having the ACI ASIC installed in the switch. The line speed of the standalone version is phenomenal and can be used by developers to use their own custom code. This opens up the range to a great number of possibilities and puts it into direct competition with Arista who already have this capability. The Nexus 9000 comes with a merchant Broadcom chip for standalone mode and combines the ACI chip for ACI fabric related operations.

Read More