Setting up vCenter 5.0.x Inventory Service wrapper.log rotation

I noticed a trend recently on one of our vCenter servers where the disk space on the C:\ drive kept running low. The OS being run is Windows Server 2008 R2. Eventually if the disk fills up, the vCenter service will fail and stop.

 

 

I used the free utility “TreeSize” to inspect the drive and see where all the disk space was being used up – the culprit was the C:\ProgramData\VMware\Infrastructure\Inventory Service\Logs\wrapper.log file – sitting at a massive 16GB in my case.

By default this log file seems to be configured to have no limits in size. I am not sure what caused the log file to grow so rapidly over the course of a week or so, but I’m sure closer inspection of the contents of the log file will reveal more detail.

For now, I will just point anyone having a similiar issue to the steps required to enable log rotation and file size limit on this wrapper.log file, which will prevent your vCenter server from running out of disk space. These steps are taken from a VMware KB article, which I will link to below.

 

High level process is as follows:

 

  • Stop vCenter service
  • rename the wrapper.log file to something else (e.g. wrapper.log.bak) – after everything is up and a new log file is in place, this could be compressed and stored away, or deleted
  • open up the configuration file at C:\Program Files\VMware\Infrastructure\Inventory Service\conf and locate the wrapper.conf
  • edit the following two lines to change to:
    • wrapper.logfile.maxsize=100m (original value is 0)
    • wrapper.logfile.maxfiles=2 (original value is 0)
  • Save the file and close it
  • Start the vCenter service again

The actual VMware KB article details the above process.

 

Troubleshooting the Autolab vCloud Director 1.5.1 installation

I have had this issue twice now, where deploying vCD via the Autolab PXE boot option on the vCD VM fails. As far as I can tell, the process seems to fail on the Oracle Express DB installation, due to the RPM not being a valid package.  The vCloud Director steps seem to be the same for Autolab 1.0 or 1.1, so the following applies to both.

error: /root/oracle-xe-11.2.0-1.0.x86_64.rpm: not an rpm package (or package manifest)

You can see the error I was getting in the screenshot I captured during boot time below. I had checked the RPM file and everything else to ensure it was in place, and indeed it was. Even vCD installs via the script, although it of course does not work due to the database not being there.

 

 

Here is the process I used to correct my vCD install.

  • Allow VM to finish booting, even with the missing oracle DB.
  • Use PuTTy to SSH to the vCD VM (either direct from your VC or DC VM, or if you have the route setup, from your host machine (in the case you are using VMware Workstation for example). Default credentials are in the Autolab setup guide document
  • Open up the “Build” share on the NAS VM, and location the vcd-install script. Default location: \\192.168.199.7\Build\Automate\CentOS\vcd-install (open this with a text editor)
  • Locate the method for each section of the install script. There is a section for each process in the script. For each method, copy out the entire block, paste it into a new text document, and remove any exclamation marks from any “echo” parts of the script. I found that manually tracking through this script using PuTTy gave me issues with the exclamation marks being misinterpreted by the shell, so I removed these. You’ll need to get a script block for the following sections and do this:
    • verify() {}
    • installOracle() {}
    • configureOracle() {}
    • generateCertificates() {}
    • installvCD() {}
    • configurevCD() {}
  • Remember to copy the whole block, including the start and end braces {} – paste these into a new text document, remove the exclamation marks, then copy-paste them back into your shell open in PuTTy. Hit enter, and the method will be entered and ready for use.
  • Once all the methods have  been copied in, you can simply type the name of the method, followed by enter to execute them. By doing it this way, you can manually step through the process and figure out where any potential remaining issues may be. This script is normally executed during the PXE boot installation process so you don’t really get a chance to slowly track through it.
  • Type each method in until you reach and complete the last “configurevCD” one
    • verify
    • installOracle
    • configureOracle
    • generateCertificates
    • installvCD
    • configurevCD
  • You may find that the generateCertificates and installvCD methods complete and echo out that they had already been completed prior – this is fine.
  • After configurevCD finished, all being well, you should now have vCD started, and you should be able to browse over to https://vcd.lab.local and finish the initial configuration via the vCD web page.

 

Other tips to try would be to:

  • MD5 hash check the RPM of the Oracle Express database that you download and place in your Build share – make sure it is not a corrupted file
  • Ensure you have the correct version of vCD and the Oracle Express database downloaded

 

vMetrics for WordPress blogs updated to version 1.1

I spent a little bit of time updating my vMetrics plugin for WordPress blogs. To give you a brief run-down, vMetrics allows you to display information from your VMware vCenter Cluster or ESX hosts / lab on your WordPress blog. It works with vSphere 4, 5 and 5.1.

 

 

In version 1.1 I have made the following changes:

Change log for version 1.1:

  • Added new metrics section for hardware information (Model and Vendor of first host in cluster – this is editable in the PowerCLI script)
  • Added configurable widget title section for Hardware
  • Updated PowerCLI updater script to have a DO WHILE loop (allowing you to run the script once on a management machine and it will keep updating your blog vMetrics every 30 minutes. (The script is called once every half hour). Thanks @dawoo for the idea 🙂
  • Added PowerCLI section to send the vendor and model type of the first ESX host it finds back to vMetrics so that you can display this information in the widget too
  • Cleaned up PHP in main plugin code

You can take a look at the main plugin page here or use the links below to download the latest version right away. Installation and configuration steps can be found on the main plugin page.

Latest version downloads (get the plugin and updater script):

[download id=”22″]
[download id=”23″]

HP N54L Microserver now listed on HP website

I am a big fan of HP’s Microserver range. They make for excellent home lab hardware, and I currently have 2 x N40L models running a small vSphere 5.1 cluster for testing, blogging and study purposes.

 

It looks like HP have now officially listed their new Microserver range on their website – the N54L. The most notable change seems to be a much beefier CPU. The original N36Ls had a 1.3GHz AMD processor, with a slight improvement to 1.5GHz on the N40Ls. The CPU has always been the weak point for me, but has been enough for me to get by on. So the N54L models are now apparently packing 2.2GHz AMD Athlon NEO processors. This is a fairly big clock speed improvement over the N40L range and should make for some good improvements for those using these as bare metal hypervisor use.

The two models being listed at the moment are:

  • HP ProLiant G7 N54L 1P 2GB-U Non-hot Plug SATA 250GB 150W PS MicroServer
  • HP ProLiant G7 N54L 1P 4GB-U 150W PS MicroServer

Adding vCenter Server to Active Directory domain and disconnecting ESXi hosts issue

The other day I came across this issue, it was quite late at night so it took me a little longer than I would have liked to realise what the issue actually was.

I had a vCenter 5.0 server which had not been joined to the local Active Directory domain. My goal was to get this added to the rest of the AD domain. After adding the vCenter server to the domain, rebooting, and checking that all the VMware services had started up correctly afterwards, I connected the vSphere client and saw that all the ESXi hosts were in a disconnected state.

At this point I tried right-clicking a host and manually connecting it – this worked, but only 60 seconds or so, and then it disconnected again. Whilst it was connected it was manageable, and of course all the VMs on each host were still fine. I tried restarting management agents on a host and retrying the procedure, but this didn’t help either. My next step was to reboot an ESXi host that didn’t have anything critical running. Still nothing at this point.

So I decided to consult the VMware vpxd log files on the vCenter server. Consult this VMware KB article to see where to find these logs.

Before opening the latest vpxd.log file, I tried the reconnect on a host again using the vSphere client, and watched for the disconnect. At the exact time I noticed the host appear disconnected again, I noted down the time on the system clock, then opened the vpxd logs to navigate to this time and take a look. Here is what I found:

2013-01-04T00:00:22.121Z [02504 warning 'Default'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-28
2013-01-04T00:00:22.121Z [02504 warning 'Default'] [VpxdInvtHost::FixNotRespondingHost] Returning false since host is already fixed!
2013-01-04T00:00:22.121Z [02504 warning 'Default'] [VpxdInvtHostSyncHostLRO] Failed to fix not responding host host-28
2013-01-04T00:00:22.121Z [02504 warning 'Default'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-28
2013-01-04T00:00:22.121Z [02504 error 'Default'] [VpxdInvtHostSyncHostLRO] FixNotRespondingHost failed for host host-28, marking host as notResponding
2013-01-04T00:00:22.126Z [02504 warning 'Default'] [VpxdMoHost] host connection state changed to [NO_RESPONSE] for host-28

This clearly shows the issue and points to it being a connectivity issue of some sort. Looking up these specific errors led me over to this VMware KB article, and it was at this point that it suddenly dawned on me – with the late night I had carelessly overlooked the Windows Firewall. Of course, Windows Firewall has settings for Windows Domains too, and of course this server had just joined the domain, so existing Firewall policies in place for vCenter that were previously on “public” settings, were now not enabled for “Domain”.

Timing the issue also revealed that it was 60 seconds before hosts disconnected again. So the issue here was that port 902 used for the host heartbeat between vCenter and the ESXi hosts was being blocked on the vCenter firewall. Unblocking this by simply enabling the rule for “Domain” fixed the issue and as soon as that was applied, all hosts reconnected by themselves. Of course I also took the time to ensure other vCenter firewall exceptions were correctly configured.

 

 

To fix, I just enabled the Domain profile that the firewall rule applies to.

 

Lastly, when examining VMware log files and settings, you may come across references to VMs, Hosts, or other VMware “objects” named as “host-28” or “vm-07” for example. These are VMware’s way of keeping reference of objects by what is called a MoRef, or “Managed object reference”. You may know host-28 as esxi03.yourdomain.local for example, so I thought I would include a handy tip for working out the Managed Object Reference name of an ESXi host to help with those vpxd.log diagnostics. Let’s say you find an interesting error mentioning moref “host-28”. You don’t know which host this is, so you can use PowerCLI to work out the morefs of hosts in a cluster and then match up the reference to the actual host name. Use this bit of script to achieve this:

 

Get-VMHost | Sort Name | Select Name,@{Name="MoRef";Expression={$_.ExtensionData.MoRef}}
Working out the MoRef of hosts using PowerCLI