Orphaned Snapshot Removal – Identify Orphans

I’ve been banging my head against the screen for the past few weeks looking at storage issues and finding orphaned volumes with reams of snapshots using up valuable disk space. In some cases it was due to manual intervention and a snapmirror or snapvault relationship was broken, in others it was caused by DFM creating new instances of the volume but not cleaning up old volumes and associated snapshots and in other cases, well I’ve no idea how they occurred. Hence why I’ve been slapping my brain around the inside of my skull. I’d be interested to know if this is still an issue with OCUM, answers on a postcard.

There’s no pretty way to clean up orphaned snapshots that are essentially owned by DFM. It’s messy, convoluted and requires that you’re very careful and precise about what you’re removing otherwise you’ll make things worse. There are a number of reasons why orphans can occur. One is down to  the way SnapProtect and the DFM work together. If a VM is deleted or moved to another volume and no other VM’s that are a part of that same backup subclient exist on that volume the snapshots will not age and will require a manual clean-up process. This seems to limit the use of automated DRS in VMware, but that’s a separate issue really. Another reason, and what looks to be the cause in my case, is that DFM has intermittent issues communication to the storage controller and thinks the volume doesn’t exist so it DFM may create a FlexClone of the volume and index it to have a new suffix while still being able to access the snapshots that were already captured. This can be caused by network drop outs out by the controllers or the CPUs maxing out and not being able to reply to DFM. I’m still investigating the cause of this. If a new storage policy was created in SnapProtect with these volumes assigned it would clear out the orphans but that would involve re-baselining the backups which is not something you’d want to do, unless of course that data had to value to you.

Before beginning this procedure open a connection to the OnCommand server and open the DFM console, the onCommand manager and NetApp System Manager 3.1. System Manager will give you insight into the volumes and what snapshots have been completed. DFM will show the backup datasets and their relationships and onCommand manager will provide some further details on capacity etc. You may find within System manager that a snapshot has not aged and notice the snapshots on the volume are much older than the other vault copies that are continually updated as part of the SnapProtect subclients and DFM relationship.

You need to find the vaulted volume on the backup storage controllers, in this case the volume is lore_2. If multiple vault copies have been created the backup controller will have created multiple volumes and will normally append _1, _2 or _3 and so on. A new increased digit gets added as a suffix to the volume name. Another example is a volume called backup but on the snapmirror location it’s called backup_5.

In this example we need to check for the correct volume that relates to the vault copy that needs to be removed from the dataset in DFM. We have 2 lore volumes so we need to be careful which one we manually modify and clear. Checking the snapshots copies I can verify which volumes need to be cleaned up, this is the one that has not been updated and needs to be removed in DFM

Orphaned Snapshot Removal Step 1

Next you’ll need to locate the dataset within DFM so that the vault relationship can be modified. Connect to DFM Server and open the DFM management console. Click on Data/Datasets and then click on the Provisioning tab. Click on each of the datasets looking for the volume name on the backup controller. In this example we are looking for lore_2

Orphaned Snapshot Removal Step 2

This vault relationship as been created by CC-xxxxxxxxxx_SS-xx_SC-61. Now we need to remove the relationship from DFM

Orphaned Snapshot Removal – DFM Relationships

Once the correct volumes have been identified it’s now time to remove the relationship from DFM. Open CMD on the DFM server

List the datasets to get the ID

C:UsersTEMP>dfpm dataset list

Orphaned Snapshot Removal Step 3

Get the ID of the dataset. In this example ID:22295. Now we need to get the vault relationships.

C:UsersTEMP>dfpm dataset list -m 22295
For the Backup: C:UsersTEMP>dfpm dataset relinquish 22299
For the Mirror: C:UsersTEMP>dfpm dataset relinquish 22296

Orphaned Snapshot Removal Step 4

Now click back on the Data/Dataset/Overview from the DFM console. Right click and Edit the dataset: in this example CC-xxxxxxxxxx_SS-xx_SC-61

Orphaned Snapshot Removal Step 5

Remove the volume from the primary physical resource, the secondary and the mirror resources. The screenshots below only show the primary data being cleared. Do the same for Backup and Mirror.

Orphaned Snapshot Removal Step 6 Orphaned Snapshot Removal Step 7 Orphaned Snapshot Removal Step 8 Orphaned Snapshot Removal Step 9 Orphaned Snapshot Removal Step 10

Orphaned Snapshot Removal – Filer Clean-up

Log onto the storage controllers via SSH. In this case I have two controllers, one that is handling snapvaults and one that is handling snapmirrors. Please note that if you only have a snapvault or a snapmirror then just follow those steps as the other ones will not be relevant. I’ve marked out which controller the commands should be executed on.

Orphaned Snapshot Removal Step 11

On the snapvault controller run the following:

Run the command: snapvault status. You’ll notice the the lag time is very high. This will help to confirm which snapshot volume you are working with.

Orphaned Snapshot Removal Step 12

Run the command: snap list <volumename> e.g. snap list lore_2. Any volumes that are still locked by either snapmirror or snapvault will need to have the process disabled. Copy the full name of the qtree path so you can run the snapvault stop command.

snapvault stop controller_name>:/vol/local_volume/primary_controller_volume
snapvault stop ontap01:/vol/lore_2/ lore

Orphaned Snapshot Removal Step 13 Orphaned Snapshot Removal Step 14

On the snapmirror controller run the following:

On the snapmirror controller you need to break the snapmirror relationship. First you can run the command: snapmirror status –l lore_2 and this will show the full snapmirror status

Orphaned Snapshot Removal Step 15

Break the snapmirror by running the break command referencing the controller and volume name.

snapmirror break backup_controller:lore_2

Orphaned Snapshot Removal Step 16

Run snapmirror status –l on the volume and ensure that it is in broken-off state

Orphaned Snapshot Removal Step 17

 

On the snapvault controller run the following:

On the snapvault  controller you can begin to delete the snapshots using the snap delete command

snap delete “Volume name”  space “snapshot name

snap delete lore_2 lore_2-base.-0

Orphaned Snapshot Removal Step 18

Once they are all removed you will see no future snaps listed for lore_2

On the snapmirror controller run the following:

Back on the snapmirror controller you will need to edit the snapmirror.conf file to remove any reference to the volume. In this case that’s lore_2. To do this run:

rdfile /etc/snapmirror.conf

Copy the contents to a text file, remove what is no longer required and run the command

wrfile /etc/snapmirror.conf

Paste the contents of the text file in. Press Enter and then Control+C. This will save the file. Run the rdfile command again to ensure that everything looks correct. The new version of the file should have no reference to the volume you want to remove.

Next you need to remove the snapshots on the volume here also. I didn’t grab a screenshot at the time but it would not allow me to remove the base snapshot as it was in a busy state. It has (busy, snapvault) listed at the end of the snapshot name. None of the commands, even the –f command would allow the snap to be deleted.

Run the command: snapvault status –l plore_2. This shows that the base snapshot was idle but it still would not allow the snap to be deleted.

Orphaned Snapshot Removal Step 19

To stop the snapvault enter the advanced privileges using priv set advanced. Then run the snapvault stop command referencing the destination from the previous snapvault status –l command. Enter yes to stop the snapvault when prompted. This step is similar to that carried out on the snapvault controller.

Orphaned Snapshot Removal Step 20

Now you can remove the snapshots as per the earlier snap delete command.

Do the following onCommand System Manager

Go back to onCommand System Manager and refresh to ensure the snapshot copies have cleared for the volume. Do this on both of the backup controllers.

Orphaned Snapshot Removal Step 21

Once the snapshots have been deleted you will see the usage % coming down as the system clears itself up.

Connect to SnapProtect Server and open the SnapProtect console. Click on Storage at the top menu and select Array Management. Click on the main filer and click list snaps and from the Array Management hopefully you will see the job number. This can be found on the primary snapshot Example: SP_2_7139_25097_1419289391

Orphaned Snapshot Removal Step 22

Orphaned Snapshot Removal Step 23

Now that the vault relationship has been removed the old snapshots can be highlighted and deleted. This will clean the rest of the snapshots spread across the other volumes. However, when I tried this I got the following error. It was already in progress

Orphaned Snapshot Removal Step 24

Once the volume on backup controllers have no more snapshots and the space utilized is 0% the vaulted volume on backup controllers can be taken offline and deleted. It will take a few hours depending on the size of the storage as to how quickly the volumes are cleared up.

Now that the snapshots have been removed, the snapvault and snapmirror relationships removed and the volumes deleted you’ll have returned the storage  capacity back to the storage pool

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.