We’ve had a Virtual Machine showing up in vCenter Server as Invalid (and greyed out) for a few days now. Ordinarily, I would have had a go at fixing it straight away but is is an archive server that is not actually in use so it has been down the priority list. Today is the day that I reached that item on the To-Do list so it was time to get onto it.
First I had to determine why it was showing as invalid. Alas, the history of what had happened left with the staff member who had recently left so I was going to be floundering the dark somewhat.
Firstly, I tried the initial step of removing the VM from the Inventory and re-adding it.
- Open vCenter Console
- Right-Click on VM, select remove from Inventory
- Browse to the DataStore and select the <VMDIR> folder
- Right-Click on the <servername>.vmx and select add to Inventory
- Complete the wizard toplace the VM in the correct group and Host.
Alas, from me that didn’t work. The Machine was still greyed out and marked as invalid.
To the Command-Line!
The next step was to check for file locks on any of the VM files. Theree are often in place on these files:
So to check the files for my VM I did the following steps:
- logged onto the ESX host via SSH
- ran command: vmware-cmd –l and noted the full path of the VM Files – /vmfs/volumes/<UUID>/<VMDIR>/<SERVERNAME>.vmx
- ran command: cd /vmfs/volumes/<UUID>/<VMDIR>/
- To check what files were locked, i ran the command: touch *
- any files that provided a result of device or resource busy was a locked file.
Who Dares Lock my VM?
So now I know that the VM was invalid because of some locked files and now it was time to find out who had locked the files and how to unlock them. Back to the command-line! To look for the culprit, I did the following:
- ran command: vmkfstools -D /vmfs/volumes/<UUID>/<VMDIR>/<LOCKEDFILE>.xxx
- Alas, as my Hosts are still ESX 4.0, I would then have to look at the System Logs instead of on-screen
- tail /var/log/vmkernel
- here is an example of the log file from a VMWare article
- the bolded number in there is the MAC address of the offending machine that has a lock on those files. In my case, it was another of my ESX hosts
Hostname vmkernel: 17:00:38:46.977 cpu1:1033)Lock [type 10c00001 offset 13058048 v 20, hb offset 3499520
Hostname vmkernel: gen 532, mode 1, owner 45feb537-9c52009b-e812- 00137266e200 mtime 1174669462]
Hostname vmkernel: 17:00:38:46.977 cpu1:1033)Addr <4, 136, 2>, gen 19, links 1, type reg, flags 0x0, uid 0, gid 0, mode 600
Hostname vmkernel: 17:00:38:46.977 cpu1:1033)len 297795584, nb 142 tbz 0, zla 1, bs 2097152
Hostname vmkernel: 17:00:38:46.977 cpu1:1033)FS3: 132: <END supp167-w2k3-VC-a3112729.vswp>
Unlocking the File
There are a number of situations that can cause another ESX Host to lock the files of a VM. I suspected that a transfer between Hosts had not been successful. I chose the easiest option first – Check to see if the offending ESX host (identified by MAC Address) was still trying to manage that VM.
- log onto newly-identified host via SSH
- ran command: vmware-cmd –l
- VM is listed there. ( surprise surprise)
- To remove the VM from the old host, run command: vmware-cmd –s unregister /vmfs/volumes/<UUID>/VMDIR>/<SERVERNAME>.vmx
Once I had don that, I repeated the first step of the day by manually removing the VM from Inventory and re-adding it again via browsing to the DataStore.
Happy Days are Here Again!
I’m sure that this was some of the ugliest fault-finding any VMWare Administrator has ever seen and any VCP’s reading this will be curled up in the corner, screaming “find a happy place”. However, for a guy who hasn’t delved below the GUI too often and shudders at the thought of command-lines and case-sensitive syntax, it got the job done and the server working again,
and that, is what I needed to get done…