Fixing Invalid VM’s in VMWare vCenter & ESX

 

We’ve had a Virtual Machine showing up in vCenter Server as Invalid (and greyed out) for a few days now. Ordinarily, I would have had a go at fixing it straight away but is is an archive server that is not actually in use so it has been down the priority list. Today is the day that I reached that item on the To-Do list so it was time to get onto it.

Why Invalid?

First I had to determine why it was showing as invalid. Alas, the history of what had happened left with the staff member who had recently left so I was going to be floundering the dark somewhat.

Firstly, I tried the initial step of removing the VM from the Inventory and re-adding it.

  1. Open vCenter Console
  2. Right-Click on VM, select remove from Inventory
  3. Browse to the DataStore and select the <VMDIR> folder
  4. Right-Click on the <servername>.vmx and select add to Inventory
  5. Complete the wizard toplace the VM in the correct group and Host.

Alas, from me that didn’t work. The Machine was still greyed out and marked as invalid.

To the Command-Line!

The next step was to check for file locks on any of the VM files. Theree are often in place on these files:

  • <servername>.vswp
  • <diskname>-flat.vmdk
  • <servername>.vmx
  • <servername>.vmxf
  • vmware.log

So to check the files for my VM I did the following steps:

  1. logged onto the ESX host via SSH
  2. ran command: vmware-cmd –l and noted the full path of the VM Files – /vmfs/volumes/<UUID>/<VMDIR>/<SERVERNAME>.vmx
  3. ran command: cd /vmfs/volumes/<UUID>/<VMDIR>/
  4. To check what files were locked, i ran the command: touch *
    1. any files that provided a result of  device or resource busy was a locked file.

Who Dares Lock my VM?

So now I know that the VM was invalid because of some locked files and now it was time to find out who had locked the files and how to unlock them. Back to the command-line! To look for the culprit, I did the following:

  1. ran command:   vmkfstools -D /vmfs/volumes/<UUID>/<VMDIR>/<LOCKEDFILE>.xxx
  2. Alas, as my Hosts are still ESX 4.0, I would then have to look at the System Logs instead of on-screen
  3. tail /var/log/vmkernel
  4. here is an example of the log file from a VMWare article
  5. Hostname vmkernel: 17:00:38:46.977 cpu1:1033)Lock [type 10c00001 offset 13058048 v 20, hb offset 3499520
    Hostname vmkernel: gen 532, mode 1, owner 45feb537-9c52009b-e812- 00137266e200 mtime 1174669462]
    Hostname vmkernel: 17:00:38:46.977 cpu1:1033)Addr <4, 136, 2>, gen 19, links 1, type reg, flags 0x0, uid 0, gid 0, mode 600
    Hostname vmkernel: 17:00:38:46.977 cpu1:1033)len 297795584, nb 142 tbz 0, zla 1, bs 2097152
    Hostname vmkernel: 17:00:38:46.977 cpu1:1033)FS3: 132: <END supp167-w2k3-VC-a3112729.vswp>

  6. the bolded number in there is the MAC address of the offending machine that has a lock on those files. In my case, it was another of my ESX hosts

Unlocking the File

There are a number of situations that can cause another ESX Host to lock the files of a VM. I suspected that a transfer between Hosts had not been successful. I chose the easiest option first – Check to see if the offending ESX host (identified by MAC Address) was still trying to manage that VM.

  1. log onto newly-identified host via SSH
  2. ran command: vmware-cmd –l
  3. VM is listed there. ( surprise surprise)
  4. To remove the VM from the old host, run command: vmware-cmd –s unregister /vmfs/volumes/<UUID>/VMDIR>/<SERVERNAME>.vmx

Once I had don that, I repeated the first step of the day by manually removing the VM from Inventory and re-adding it again via browsing to the DataStore.

Happy Days are Here Again!

I’m sure that this was some of the ugliest fault-finding any VMWare Administrator has ever seen and any VCP’s reading this will be curled up in the corner, screaming “find a happy place”. However, for a guy who hasn’t delved below the GUI too often and shudders at the thought of command-lines and case-sensitive syntax, it got the job done and the server working again,

and that, is what I needed to get done…

This entry was posted in Computers, Work and tagged . Bookmark the permalink.

3 Responses to Fixing Invalid VM’s in VMWare vCenter & ESX

  1. Veeru says:

    One of my storage hosting NFS datastore went down due to network issues and when it came up, many VMs were in ‘Unknown’ state. When browsed the datastore they were there. I removed the VMs from inventory and tried to add back but to find the Machines greyed out and marked as invalid. I logged into the ESX CLI, but doing touch * in the VM directory didn’t show any locked file. I don’t have VC managing the ESX. Any ideas on this?

    • You can have this issue when the Vmachines are added to different hosts then the one that you are working on that time. This is called sometimes “Getto Vmotion” when you don’t have the proper Vm license and you want to have the future for high-available cluster and shared NFS/iSCSI VMs are added on other hosts. I had the same problem on iSCSI share just like you had. The extra added node went gray/invalid because the other node locked the inventory. This caused more trouble in my case, because I could not create snap shots with my Thinware backup.

  2. Pingback: More on Locked files in VMware vCenter and ESX | BigChaps

Leave a comment