Skip to content

Categories:

Host takes a long time to reconnect to vCenter and performs slowly due to corrupt esx.conf

I noticed a number of my Dell PowerEdge 2900 ESX 3.5 U4 hosts would take a long time to reconnect to vCenter after a reboot, or would appear as ‘disconnected’ or ‘not responding’ for up to 4 hours.

When logging into the console I see the following message: “configuration changes were not saved successfully during previous shutdown”.

/etc/vmware/esx.conf.LOCK file is present and sometimes other esx.conf.RaNdOm files too.  When I delete the .LOCK it just comes back again, and a host shutdown throws up errors as the file is locked.  When the host restarts I see the same error when logging in.

esxcfg-vswitch -l gives the following: “Listing failed for vswitch: vSwitch0, Error: Error interacting with configuration file /etc/vmware/esx.conf: Failed attempting to lock file. Another process has locked the file for more than 10 seconds. The process holding the lock is /usr/lib/vmware/hostd/vmware-hostd/etc/vmware/hostd/config.xml-u (2237). This operation will complete if it is run again after the lock is released.

The esx.conf file is huge (sometimes between 600K and 9MB) and contains spurious entries for resource pools – all of which are empty.

The hostd.log file shows errors such as: “Destroying unregistered VMkernel resource group ‘host/user/pool0/pool234/pool1‘” and top shows the hostd process runs at 100% CPU for up to 4 hours while it processes the incorrect resource pools in the esx.conf file.  A tail of /var/log/vmware/hostd.log shows it takes about 5 seconds to process each incorrect resource pool line.

The clusters these hosts are connected to used to have about 30 resource pools which were removed recently.  I checked a host in a cluster which never had resource pools and the esx.conf file looks fine and is only 40KB in length and that host doesn’t show these problems.

So it appears that somehow the esx.conf file has become corrupted, as it contains thousands of empty resource pools.  After finding nothing on Google I finally found a post from another user with the same problem.

VMware Support are looking into this issue and have confirmed it’s a bug. It will shortly be fixed in ESX v4.0 and they say ESX v3.5 will be fixed soon after. :-)

Posted in Issues.

Tagged with , , .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.

Spam Protection by WP-SpamFree