Skip to content

Categories:

ESX Incorrectly Resignatured LUNs

We recently had a major storage outage after presenting existing LUNs from one ESX cluster to another – the goal was so both clusters shared the same LUNs so we could merge the clusters. This normally shouldn’t be an issue if there is a uniform presentation (ie consistent LUNIDs across all hosts/initiators).

However, when a rescan was performed on the new hosts, ESX decided to resignature some of the LUNs, resulting in the existing ESX hosts losing connectivity and causing the VMs to shut down. We then had to manually edit the .vmx files to point the drives to the updated datastore locations. We confirmed that the LUN IDs and SAN presentation were the same, so couldn’t understand why it was resignaturing.

I found a couple of VMware KB articles which described similar symptoms:

Root Cause

It turned out to be due to these two factors:

  • The affected datastores were originally formatted as VMFS2 (ie ESX 3.0) and had been upgraded at some point to VMFS3
  • The LVM.EnableResignature flag was enabled on the new ESX hosts but not on the old

Details

The messages log on the new ESX host showed that ESX detected the LUN as a snapshot, and that the LVM version was mismatched:

Sep 11 13:21:03 vmkernel: 5:20:20:13.259 cpu20:287470)LVM: 7455: Device naa.6006048000018712071753594d303739:1 detected to be a snapshot:

Sep 11 13:21:03 vmkernel: 5:20:20:13.471 cpu20:287470)WARNING: LVM: 2397: [naa.6782bcb0771cbe00167e415d0b0c0d90:3] LVM major version mismatch (device 5, current 3)

It then resignatured it and appended snap-037b03e4 to the datastore name:

Sep 11 13:21:04 vmkernel: 5:20:20:13.936 cpu20:287470)LVM: 7820: Device naa.6006048000018172071753594d303739:1 unsnapped

Sep 11 13:21:04 vmkernel: 5:20:20:13.939 cpu20:287470)LVM: 4900: Snapshot LV <snap-55656724-4d4b315c-1e142455-9e45-000423e569f1> successfully resignatured

Sep 11 13:21:04 vmkernel: 5:20:20:13.968 cpu20:287470)Vol3: 1070: [label: CORE-PROD-APP-012-DMX717-079, uuid: 4d4b315e-34cbb938-1f4a-000423e569f1] detected as a snapshot file system.

Sep 11 13:21:04 vmkernel: 5:20:20:13.970 cpu20:287470)Vol3: 871: Begin resignaturing volume label: CORE-PROD-APP-012-DMX717-079, uuid: 4d4b315e-34cbb938-1f4a-000423e569f1

Sep 11 13:21:04 vmkernel: 5:20:20:14.022 cpu20:287470)FS3: 6109: Marking HB [HB state abcdef02 offset 3354624 gen 969 stamp 12873297056919 uuid 4f8acb87-dce27b80-6e52-000423e56917 jrnl <FB 227000> drv 8.46] on vol ‘snap-037b03e4-CORE-PROD-APP-012-DMX717-079’

The hosts in the other cluster lose connectivity, as they’re still looking for the original uuid, causing VMs to lose disks or shutdown.

VMware confirmed that LVM version 3 was from ESX 3.0, so this datastore was originally formatted as VMFS2 and has since been upgraded to 3.46. The old ESX hosts didn’t resignature the LUN because the LVM.EnableResignature flag was set to 0 (the default).

Why was the LVM.EnableResignature flag set to 1 on the new ESX hosts? It turns out that Site Recovery Manager (SRM) turns it on when it performs test failovers or failovers. According to VMware, “this is the only immediately-known automated cause for enabling the LVM.enableResignature flag on one or more hosts.”

http://kb.vmware.com/kb/2010051: Setting LVM.enableResignature =1 remains set after a SRM Test Failover

Checking for problem LUNs

VMware recommended that we do the following:

  • Check which hosts have LVM.EnableResignature set to 1 by running this command via SSH on all hosts:
    • esxcfg-advcfg -g /LVM/EnableResignature
  • Set this to 0 on all ESX hosts:
    • esxcfg-advcfg -s 0 /LVM/EnableResignature
  • Perform a rescan
  • Check the ESX messages log and grep for snapshot. This will show which datastores have (incorrectly) been detected as snapshot LUNs. Or, run this command:
    • esxcfg-volume -l
  • If snapshot LUNs were found which shouldn’t be, storage vMotion the VMs to other LUNs
  • Reformat the problem LUNs or enable resignature flag and perform rescan so it resignatures it

Posted in Uncategorized.

Tagged with , , , , , .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.

Spam Protection by WP-SpamFree