post

Fix: NetApp DataFabric Manager Certificate has expired

Following the upgrade of DFM from version 5.2.0 to 5.2.1 I started to see a warning in the onCommand Management console that the NetApp DataFabric Manager had expired and to create a new one.

dfm-cert-failure

Surprisingly the cert had expired ages ago but neither I nor anyone else noticed. The first step in fixing the issue was to check the SSL service details to find the expiry date of the current certificate. To find this open a command prompt and run the command:

dfm ssl service detail

If the cert is not valid after the current date, or in my case after Dec 9 2015 then a new one needs to be created.

dfm-check-cert

The steps to create a new certificate are:

dfm ssl server setup
KeySize: 2048
Country Name: AU (or whatever two letter country code suites your needs)
State or Province: <insert your state name>
Locality Name: <insert your city>
Organization Name: <insert company name>
Common Name: <insert FQDN of your DFM server>
Email Address: <insert your address>

Once the cert has been created you’ll be prompted to restart the http services.

dfm-check-cert1

Once you restart the services you can acknowledge the alert in onCommand Manager and the alert will be gone

post

How To: End-to-End SnapProtect Storage Policy Creation

The example I’m going to give here is for an environment that is already configured but has storage controllers that are not configured as NAS iData Agents for backup or any volumes on those controllers. In this environment the controller I’m enabling backups on is the secondary storage tier which is already a snapvault destination so it is in the Array Manager for SnapProtect. This environment requires new NetApp aggregates to be added as resource pools as new volumes and data have been assigned to the aggregates. The process involves working with SnapProtect, NetApp Management Console (DFM) and the NetApp storage controllers.

Enable backups on a controller within SnapProtect:

Enable Accounts

Before you begin log onto the controller and ensure the login account you require has access to the controller. To do this open a SSH session to the controller and use the following command:

#useradmin user list

If the account used by SnapProtect doesn’t exist then you’ll need to add it as an administrator. Below is a snapshot from a controller where it is added and also a controller where it’s not.

SnapProtect End to End Step 1

SnapProtect End to End Step 2

You can add the account via the command line or via the web console. In this instance I added the account via the console. In System Manager go to Configuration -> Local Users and Groups -> Users and click Create. Enter the required details and click Create.

SnapProtect End to End Step 3

Once completed you can re-run the command from the CLI to ensure that the account appears Read More

post

How to: Present iSCSI storage from a NetApp vfiler (7-Mode)

As part of a recent data migration I had to enable a vfiler to allow iSCSI traffic as a number of virtual machines in the environment require block storage for clustering reasons. The vfiler already presents via NFS and iSCSI. As this is a test environment I’ve decided to put iSCSI on the same link as the NFS and CIFS. I know this is not normal best practice but given that the vLANs are already in place and that this is a test environment I decided to use the same IP address range. The servers accessing the iSCSI LUNs don’t have access to CIFS or to any NFS mounts already so there should be no traffic cross-over. So onto the steps to set it up:

Step 1: Allow iscsi protocol and RSH on vfiler (at vfiler0)

Check the status of the vfiler using the command

vfiler status -a tenant_vfiler
tenant_vfiler running
 ipspace: tenant_vfiler_NFS_CIFS
 IP address: 192.168.2.1 [a1a-107]
 IP address: 192.168.2.2 [a1a-107]
 Path: /vol/tenant_vfiler_vol0 [/etc]
 Path: /vol/nfs03
 Path: /vol/nfs04
 Path: /vol/nfs02
 Path: /vol/nfs01
 Path: /vol/cifs01
 Path: /vol/iso01
 Path: /vol/iscsi_test
 UUID: 93c62e36-4e76-11e4-8721-123478563412
 Protocols allowed: 7
Disallowed: proto=rsh
 Allowed: proto=ssh
 Allowed: proto=nfs
 Allowed: proto=cifs
Disallowed: proto=iscsi
 Allowed: proto=ftp
 Allowed: proto=http
 Protocols disallowed: 2

Next run the command:

vfiler allow tenant_vfiler proto=iscsi
vfiler allow tenant_vfiler proto=rsh

Step 2: Start iSCSI protocol on vfiler (at apaubmwvfi01)

vfiler context tenant_vfiler
iscsi start

Step 3: Create a new volume at vfiler0

vfiler context vfiler0
vol create iscsi_test_vol -s 20g

Step 4: Migrate the volume to apaubmwvfi01 and log into the vfielr to check the volume status

vfiler add tenant_vfiler /vol/iscsi_test
vfiler context tenant_vfiler
vol status

Step 5: Set priv advanced and modify the exports to the correct settings as below

To modify the exports read the current /exports and write it back. Once done run the exportsfs -av command to push the changes out

rdfile /vol/tenant_vfiler_vol0/etc/exports
/vol/nfs01 -sec=sys,rw=192.168.1.0/24,anon=0
/vol/nfs02 -sec=sys,rw=192.168.1.0/24,anon=0
/vol/nfs03 -sec=sys,rw=192.168.1.0/24,anon=0
/vol/nfs04 -sec=sys,rw=192.168.1.0/24,anon=0
/vol/iso01 -sec=sys,rw=192.168.1.0/24,anon=0
/vol/iscsi_test -sec=sys,rw=192.168.1.0/24,anon=0
vfiler run tenant_vfiler exportfs -av

Step 6: Create a lun from the volume (iscsi_test)

vfiler run tenant_vfiler lun create -s 10g -t windows2008 /vol/iscsi_test/iscsi_lun

Step 7: Change filer and run lun show

lun_show

Step 8: Verify iSCSI network within VMware has been assigned to the VM
iSCSI network
Step 9: Enable iSCSI Initiator – grab the iqn

iSCSI initiator iqn

Step 10: Create an igroup with the iqn of the server

igroup create -t Windows2008 ds_iscsi 
igroup add ds_iscsi iqn.1991-05.com.microsoft:microsoft:server.domain.com

Step 11: map the lun to the group name

map_lun_to_group

Step 12: run lun show -m to check the mapping

lun_show_mapping

Step 13: Run a quick connect to the IP address of the controller

iscsi_quick_connect

And now your disk should appear in the disk manager on the server. It’s not too different to setting up a normal iSCSI connection but RSH must be enabled otherwise it can’t tunnel the iSCSI  request to the vfiler iqn target.

post

UCS Director 5.4 – Post upgrade NetApp storage connection issue

After a recent upgrade to UCS Director 5.4 I noticed that my storage connections were showing a status of failed on the dashboard. I went to Administration -> Physical Accounts -> Physical Accounts. All of my NetApp controllers were offline.
Netapp connection fail UCS Director
I went to edit settings and re-entered my password to make sure that it had been picked up correctly.


Netapp UCSD Edit settingsAll the settings were fine so I saved them and tested the connection to the controllers again.
NetApp UCSD Test Connection

The connection failed with the following error:

500 Connection has been shutdown: javax.net.ssl.SSLHandsakeException:

Server chose SSLv3, but that protocol version is not enabled or not supported by the client.

Read More

post

VMware Metro Storage Cluster Overview

VMware Metro Storage Cluster

VMware Metro Storage Cluster (vMSC) allows vCenter to stretch across two data centers in geographically dispersed locations. In normal circumstances, in vSphere 5.5 and below at least, vCenter would be deployed in Link-Mode so two vCenters can be managed as one. However, with vMSC it’s possible to have one vCenter manage all resources across two sites and leverage the underlying stretch storage and networking infrastructures. I’ve done previous blogs on NetApp MetroCluster to describe how a stretched storage cluster is spread across two disparate data centers. I’d also recommend reading a previous post done on vMSC by Paul Meehan over on www.virtualizationsoftware.com. The idea behind this post is to provide the VMware view for the MetroCluster posts and to give a better idea on how MetroCluster storage links into virtualization environments.

The main benefit of a stretched cluster is that it enables workload and resource balancing across datacenters. This helps companies to reach almost zero RTO and RPOs and ensure uptime of critical systems as workloads can be migrated easing using vMotion and Storage vMotion. One thing to keep in mind regarding vMSC, it’s not really sold as a disaster recover solution but rather a disaster avoidance solution when linked with the underlying storage. Some of the other benefits of a stretched cluster are:

  • Workload mobility
  • Cross-site automated load balancing
  • Enhanced downtime avoidance
  • Disaster avoidance
  • System uptime and high availability

There are a number of storage vendors that provide the back-end storage required for a vMSC to work. I won’t go into the entire list but you can find out more on the VMware Compatibility Matrix site. The one that I have experience with is NetApp MetroCluster but I know of others from EMC and Hitachi at least. So what components make up a vMSC? It comes down to an extended layer 2 network across data centers so that vMotions can take place with ease and also a resilient storage platform connected to ESXi via VMFS or NFS datastores. VMware vCenter itself does need some configuration changes but it’s nothing outside the scope of what a regular VMware admin can implement. A view of what a vMSC looks like is below. The networking and storage components have been simplified.

fabric metro cluster diagram

 

Read More

NetApp MetroCluster Overview – Part 8 – Further Reading


Here are some links to further reading that will help with getting a far deeper understanding of MetroCluster:

High Availability and MetroCluster Configuration Guide

MetroCluster Best Practices for Implementation

Configuring a stretch MetroCluster system with SAS disk shelves

Installing FC-to-SAS bridges and SAS disk shelves

A Continuous-Availability Solution for VMware vSphere and Netapp

MetroCluster Plug-in 1.0 for vSphere

MetroCluster Clustered On-Tap was not covered as part of this series but if you’re looking at CDOT MetroCluster the below documents may be useful:

OnTap 8.3

MetroCluster Management and Disaster Recovery Guide

MetroCluster Installation Express Guide

MetroCluster Installation and Configuration Guide

post

NetApp MetroCluster Overview – Part 7 – MetroCluster Tools

 

There’s not many tools available specifically for MetroCluster but I’ve added the ones I found below. If anyone knows of any others please let me know and i’ll update this post.

FMC_DC
The FMC_DC can be downloads from here -> http://mysupport.netapp.com/NOW/download/tools/FMC_DC/. It will require a NetApp NOW account.

Fabric MetroCluster Data Collector

The FMC_DC is the Fabric Metro Cluster Data Collector which can be configured to gather information on all components (controllers, switches, bridges etc.) of the MetroCluster infrastructure. Once the components have been added a health check can be run. This health check appears as a card on the application and will show whether the components are healthy or need further investigation.

I’d recommend having a look over this document to get started with FMC_DC

http://community.netapp.com/t5/Developer-Network-Articles-and-Resources/FMC-DC-Starter-Guide/ta-p/86351

While the FMC_DC doesn’t provide any management features it does provide peace of mind that all components are configured so that failover can be successful. If you’re doing a DR test I’d definitely recommend using it.

Read More

post

NetApp MetroCluster Overview – Part 6 – Best Practices and Recommendations

 

These are some of the things to look out for with MetroCluster and can be considered best practices and recommendations.

Disable change_fsid

One very important configuration change to be done on MetroCluster controllers is to immediately disable the change_fsid option. If it is not disabled the all volumes and LUNs will be renamed during failover and make it impossible to volumes and LUNs to be referenced. This is really critical for LUNs.

To avoid the FSID change in the case of a site takeover, you can set the change_fsid option to off (the default is on). Setting this option to off has the following results if a site takeover is initiated by the cf forcetakeover -d command:

  • Data ONTAP refrains from changing the FSIDs of volumes and aggregates.
  • Users can continue to access their volumes after site takeover without remounting.
  • LUNs remain online.

If you don’t disable the change_fsid option in MetroCluster configurations the following happens when the cf forcetakeover -d command is run:

  • Data ONTAP changes the file system IDs (FSIDs) of volumes and aggregates because ownership changes.
  • Because of the FSID change, clients must remount their volumes if a takeover occurs.
  • If using Logical Units (LUNs), the LUNs must also be brought back online after the takeover.
options cf.takeover.change_fsid off

MetroCluster RC file
Read More

post

NetApp MetroCluster Overview – Part 5 – Failure Scenarios for MetroCluster

 

Failover/Failure Scenarios for MetroCluster

I’m not going to re-invent the wheel here. These failure scenarios are all pretty self-explanatory and can be found in TR-3788.pdf. There’s far more scenarios in that document but here I’ll cover off some of the most common types.

Scenario: Loss of power to disk shelf

MetroCluster Failure Disk Shelf

Expected behaviour: Relevant disks for offline and the plex is broken. There’s no disruption to data availability to hosts running HA (VMware High Availability) or FT (Fault Tolerance), no change is detected by the ESXi Server. When the shelf is powered back on the plexes will sync automatically

Impact on data availability: None

 

Scenario: Loss of one link in one disk loop

MetroCluster Failure Inter-Switch Link

Expected behaviour: A notification appears on the controller to advise that disks are only accessible via one switch. There’s no disruption to data availability to hosts running HA or FT, no change is detected by the ESXi Server. When the connection is reset an alert on the controller will advise of connectivity across two switches

Impact on data availability: None

 

Scenario: Failure and Failback of Storage Controller Read More

post

NetApp MetroCluster Overview – Part 4 – Cabling of Fabric MetroCluster

 

Cabling of Fabric MetroCluster

The cabling of a MetroCluster is the key. Outside of some licensing it’s the cabling that’s really the only different between MetroCluster and Mirrored HA pair. Yes it’s a bit more complex for failover and failback but really the main difference from a setup point of view is the cabling. There’s a large number of cables and the configuration should be all mapped out before beginning putting equipment into your racks. I would heartily recommend reading MetroCluster and High Availability Guide before starting to understand your cabling requirements. Below is not a how-to on how to connect everything, it’s just an overview with a brief explanation. The above NetApp document is very detailed and should answer any questions you may have.

A simplified view of a Fabric MetroCluster is as follows:

Fabric Switch

I found this workflow from NetApp documentation which is quite useful as a guideline on how the bridges should be cabled.

Cable connection workflow

Read More