post

Fix: VMware – Quiesced Snapshots failing – Unexpected error DeviceIoControl

I ran into an interesting problem that took a bit of digging around to both find the root cause and also to find the final fix. When running backups on Vmware 5.5 running on NetApp storage I could see some, but not all VMs, failing and throwing up the below errors in the event logs

Event ID 57 ntfs Warning
The system failed to flush data to the transaction log. Corruption may occur.

Event ID: 137 ntfs Error
The default transaction resource manager on volume \?Volume{806289e8-6088-11e0-a168-005056ae003d} encountered a non-retryable error and could not start. The data contains the error code.

Event ID: 12289 VSS Error
Volume Shadow Copy Service error: Unexpected error DeviceIoControl(\?fdc#generic_floppy_drive#6&2bc13940&0&0#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b} - 00000000000004A0,0x00560000,0000000000000000,0,0000000000353B50,4096,[0]). hr = 0x80070001, Incorrect function.


The key alert here is Event ID 12289. It was also the most off-putting. It initially looked like a floppy drive issue but there was no floppy drive attached to the VM nor were there any floppy drivers installed on the VM. A look around the VMware community forums led me to this posting – https://communities.vmware.com/thread/309844?start=0&tstart=0 It was focused more on vSphere 4.1 however and most of the advice was around installing an older version of VMware Tools. Comment 27 was the jackpot winner. The System Reserved partition was causing the issue.

So what does the System Reserved partition do?

The System Reserved partition contains the Boot Manager and Boot Configuration data that are read on start up of the virtual machine. The VM boots from the boot loader n the System Reserved partition and then boots Windows from the System drive. It is also used as a location for the start up files for BitLocker Drive Encryption. If you need BitLocker then you’ll need to have a System Reserved partition. For Windows client OS’s then that’s a great feature to have but from a server OS perspective where BitLocker just isn’t used then it’s superfluous. The System Reserved partition is created by default on OS installation so there’s two options to remediate.

  1. Remove the partition manually post installation
  2. Remove the partition from your Windows OS templates

I won’t go into the details on how to remove the partition from your templates here but you can find more information over on mydigitallife.info which can be used. I ran through the steps myself to do this for all of our Windows templates following finding the root cause of the initial error.

As per one of the links mentioned in Comment 27 in the VMware communities post it’s possible to change the location of the boot files so that the partition can be removed. This information can be found over on geekshangout.com. However the steps didn’t include how to re-claim that partition so that there isn’t an unallocated disk partition sitting in front of the C drive (disk 0). While I haven’t tested backups in this configuration I wouldn’t be surprised if it cause other issues during backup. So below I’ve listed the steps to follow so you can successfully remove the partition as per the steps on geekshangout and then re-claim the space on gparted.

Delete System Reserved partition and reclaim space

Read More

VMware – Security vulnerability VMSA-2015-0007

VMware announced over the weekend that some major security vulnerabilities have been identified in vCenter and ESXi 5.0, 5.1 and 5.5 as well as version 6.0. 6.0 Update 1 is not affected. Only the JMX RMI Remote code execution is an issue in vSphere 6.0. 3 vulnerabilities have been identified and the affect different versions in total.

ESXi OpenSLP Remote Code Execution

  • Allows unauthenticated users to execute code remotely on ESXi host

vCenter Server JMX RMI Remote Code Execution

  • An unauthenticated remote attacker that is able to connect to the service to execute arbitrary code on the vCenter server

vCenter Server vpxd denial-of-service vulnerability

  • Can allow a remote user to create a denial of service on the vpxd service through unsanitized heartbeat messages

The announcement was broken on both the VMware and TheRegister sites and I’d recommend viewing more information on both of those sites. TheRegister also gives some great background on how the issues were originally identified. The full advisory details including links to the CVE references can be viewed on the VMware Security Advisories site for VMSA-2015-0007.

If you are running vSphere 5.0 the recommendation is to upgrade to v5.0 Update 3e. For vSphere 5.1 upgrade to v5.1 Update 3. For vSphere 6 the recommendation is to patch with Update 1. vSphere 5.5 however has some issues. In order to fix the denial-of-service or the OpenSLP issues it’s advised to upgrade to vSphere 5.5 Update 2. However, to resolve the JMX RMI issue VMware have confirmed that vSphere 5.5 Update 3 which was released in early September as being the fix. But, a new bug has been identified with Update Patch 3 regarding snapshots. If a snapshot is deleted in vCenter it causes the VM to crash. Considering that the majority of snapshot related backup solutions utilise VMware snapshots it means that all VMs would reboot each night. Considering uptime is always a business and IT priority then it’s really not a feasible solution.

My advice would be to at least upgrade to vSphere 5.5 Update 2 if you can. Upgrade to vSphere 6.0 Update 1 if possible but that may require considerable research and interoperability checks and may not be on your roadmap just yet. Do not install ESXi 5.5 Patch 3 if your backup software depends on VMware snapshots.