vSphere 5 & HA Heartbeat Datastores

 

I was busy updating my vSphere lab from 4.1 to 5 and ran into a warning on the first ESXi host I updated to ESXi 5.0. It read: “The number of vSphere HA heartbeat datastores for this host is 1, which is less than required: 2”. The message itself is fairly self-explanatory, but prompted me to find out more about this as I immediately knew it must be related to new functionality.

 

The Configuration Issue message

 

Pre-vSphere 5.0, if a host failed, or was just isolated on its Management Network, HA would restart the VMs that were running on that host and bring them up elsewhere. (I have actually seen this happen in our ESX 4.0 environment before!) With vSphere 5.0, HA has been overhauled and I believe this new Datastore Heartbeat feature is part of making HA more intelligent and able to make better decisions in the case of the Master HA Host being isolated or split off from other hosts. This Datastore Heartbeat feature should help significantly in the case of HA initiated restarts, allowing HA to more accurately determine the difference between a failed host and a host that has just been split off from the others for example.

 

vCenter will automatically choose two Datastores to use for the Datastore Heartbeat functionality. You can see which have been selected, by clicking on your cluster in the vSphere client, then choosing “Cluster Status”. Select the “Heartbeat Datastores” tab to see which are being used.

 

Cluster Status - viewing the elected HA Heartbeat Datastores

 

Without going into too much detail, this mechanism works with file locks on the datastores elected for this purpose. HA is able to determine whether the host has failed or is just isolated or split on the network by looking at whether these files have been updated or not. After my lab upgrade I noticed a new folder on some of my datastores and wondered at first what these new files were doing there! If you take a look at the contents of the Datastores seen your Heartbeat Datastores tab, you should see these files that HA keeps a lock on for this functionality to work.

 

Files created on HA Heartbeat Datastores for the new functionality

 

So, if you notice this configuration issue message, chances are your ESXi 5 host in question simply doesn’t have enough Datastores – this is likely to be quite common in lab environments, as traditionally we don’t tend to add many (well at least I don’t!) In my case this was a test host to do the update from 4.1 to 5 on, and I only had one shared datastore added. After adding my other two datastores from my FreeNAS box and an HP iSCSI VSA, then selecting “Re-configure for HA” on my ESXi host, the message disappeared as expected. I believe there should be some advanced settings you could also add to change the number of datastores required for this feature, but I have not looked into these yet. Generally, it is also always best to stick with VMware defaults (or so I say) as they would have been thought out carefully by the engineers. Changing advanced settings is also usually not supported by VMware too. However, if you find you are short on Datastores to add and want to get rid of the error in your lab environment, then this shouldn’t be a problem to change.

PowerCLI – Get HA Restart Priority and Memory Overhead for VMs

 

Another day, another PowerCLI automation script. Here is a handy cmdlet that will fetch you a list of VMs in your environment based on whether they are currently powered on.

 

The interesting bit of info retrieved is their current HA Restart Priority. This will be returned whether the VM itself has its own HA Restart Priority defined, or if the VM is getting this setting from the cluster it is running in. Another interesting bit of info we fetch is the VMs current Memory Overhead figure. Additionally, we also fetch some other useful information such as which Host the VM runs on and the RAM & vCPU assignments for the VM, sorting the whole list by the name of each VM. Remember that the Memory Overhead that is reported back is what the current overhead for the VM is. This can change and there is quite a bit involved in how it is worked out. Further reading on Memory Overhead for VMs can be done here. Read Table 3-2 on page 30 of the linked PDF for the specifics.

 

(Get-VM | Where {$_.PowerState -eq "PoweredOn"}) | Select Name,PowerState,NumCpu,MemoryMB,VMHost,@{N="MemoryOverhead";E={$_.ExtensionData.Runtime.MemoryOverhead/1MB}}, HARestartPriority | Sort Name | ft

 

If you don’t want a result formatted in a table, just remove the “| ft” at the end of the cmdlet, and remember if you want to push the results of this into a CSV file, you can also just replace the “ft” with a “Export-CSV C:\yourfile.csv”. If you have any improvements to the above script, suggestions or comments, please add do add them!

How to set up a VMware vSphere Lab in Virtual Machines, with DRS and HA

 

I recently wrote a (reasonably!) lengthy article on how to set up your own VMware vSphere lab or test environment consisting entirely of Virtual Machines, running off of one piece of host hardware. This is really handy as a lot of people new to Virtualization often think they need to purchase full on server equipment to create a white box, or find second hand servers off of eBay. Even more often, they make the mistake of overlooking the CPU feature set required to run vSphere – Hardware Virtualization, buying 64bit capable servers (good), but lacking the Intel VT or AMD-V feature-set required for vSphere (bad!)

 

This is when running everything virtualized comes in really handy. As well as keeping your hardware and lab requirements/size down, you have everything you need all in one installation of VMware Workstation. You’ll also be able to test out some really cool features that vSphere / vCenter Server has to offer – such as HA (High Availability) and DRS (Distributed Resource Scheduling). In the article I also make reference to a few best practises to have when configuring the real deal for production use. I hope this comprehensive guide is useful for those of you looking to set something like this up!

 

VMware lab consisting - nested VMs running in Virtualized ESXi hypervisors.

 

Read the article here on Simple-Talk.com to get started and see how its all done!

 

 

How to restart a slave FortiGate firewall in an HA cluster

Here’s a quick how-to on restarting a specific member of a High Availability FortiGate hardware firewall cluster. I have only tested this on a cluster of FG60 units, but am quite sure the steps would be similar for a cluster of FG100s, FG310s etc…

get-ha-status

First of all you may or may not want to set up some monitoring going to your various WAN connections on the HA cluster. Restarting the slave unit should not have any effect on these connections in theory as your master unit is the one handling all the work. The slave is merely there to take over should things go pear shaped on the master unit. When the slave restarts you can watch your ping statistics or other connections just to ensure everything stays up whilst it reboots.

1. Start by logging in to the web interface of your firewall cluster. https://ipaddress

2. Specify a custom port number if you have the management GUI on a custom port for example https://ipaddress:555

3. Login and look for “HA status” under the status area – this should be the default page that loads. It should show as “Active-passive” if this is the mode your HA cluster is in. Click the [Configure] link next to this.

4. This will give you an overview of your HA cluster – you can view which unit is the Master and which is the slave. This step is optional and just gives you a nice overview of how things are looking at the moment. Click “View HA statistics” near the top right if you would like to view each unit’s CPU/Memory usage and other statistics.

5. Return to the “Status” home page of your firewall GUI. Click in the “CLI Console” black window area to get to your console. (Optionally, you could also just SSH in if you have this enabled).

6. Type the following command to bring up your HA cluster details: get system ha status

7. This will show which firewall is master and slave in the cluster e.g.

Master:129 FG60-1 FWF60Bxxxxxxxx65 1
Slave :125 FG60-2 FWF60Bxxxxxxxx06 0

Look for the number right at the end and note this down. In the above example the Slave unit has the number “0” . Note this down.

8. Next enter the following command: execute ha manage x

Where “x” is the number noted down in step number 7.

This will change your management console to this particular firewall unit. i.e. the slave unit in our case. You should notice your command line change to reflect the name of the newly selected HA member.

9. Enter the following command to reboot the slave: execute reboot

10. Press “Y” to confirm and reboot the slave.

Monitor your ping / connection statistics to ensure everything looks fine. Give it a minute or so to boot up again, then return to your HA statistics page to ensure everything looks good.

That is all there is to it.

Installing VMWare ESX using a Dell DRAC card

Here is a how-to on installing VMWare ESX 3.5 using a DRAC (Dell Remote Access Controller) card to access the server. I was installing a new cluster in a Dell M1000e Blade Centre for work the other day and wrote up this process in order for it to be documented for anyone else doing it in the future.

Just for interests sake the basic specs of the system are:

1 x Dell M1000e Blade Centre
3 x Redundant 2000w+ Power supply units
16 x Dell M600 Blades (Each one has 2 x Quad core Xeon CPUs and 32GB RAM).

1. Connect to the M1000e’s chassis DRAC card.
a. Connect to M1000e chassis DRAC card. (https://x.x.x.x) – use the IP for that particular blade centre’s DRAC card. Login with DRAC credentials.
b. Use the agreed DRAC user credentials, or if this is a new setup, the defaults are username: root password: calvin).

login_drac

2. Select boot order for Blade and power it up
a. Choose the Blade server number that you will be working with from the Servers list on the left side.
b. Click on the Setup tab, and choose Virtual CD/DVD as the first boot device then click Apply.
c. Select the Power Management tab and choose Power on, then click Apply.

configure_boot_order_for_blade

3. Go to iDRAC console of the blade server
a. Click on Launch iDRAC GUI to access the iDRAC for the blade you have just powered on.
b. You will need to login again as this is another DRAC we are connecting to (This time the DRAC is for the actual blade server not the chassis).

launch_idrac_gui

4. Configure Mouse
a. Click on the Console tab near the top of the screen and then click the Configuration button near the top.
b. In the mouse mode drop down, select Linux as the mouse type, then click Apply.

configure_mouse

5. Launch Console viewer
a. From the console tab we can now select the Launch Viewer button.
b. An activeX popup might appear – allow it access and the DRAC console should now appear with the server in its boot process.

6. Mount Virtual ISO media for installation disc (ESX 3.5)
a. Click on Media, and then select Virtual Media Wizard.
b. Select ISO image and then browse to the ISO for ESX 3.5 – this could be on your local drive or a network share.
c. Click the connect CD/DVD button to mount the ISO.
d. Your boot order should be configured correctly to boot off this ISO now. (*Optional* You could always press F11 whilst the server is booting to choose the boot device anyway).

attach_virtual_media_iso

7. Reboot the server if the boot from virtual CD/DVD has already passed
a. Go to Keyboard – Macros – Alt-Ctrl-Del to do this.

8. ESX install should now start.
a. Press enter to go into graphical install mode
b. Select Test to test the media (The ISO should generally be fine).
c. Select OK to start the install.
d. Choose United Kingdom Keyboard layout (or whatever Keyboard layout you use).
e. Leave the mouse on generic 3 button USB.
f. Accept the license terms.

esx_install_start

esx1

9. Partitioning
a. For partition options, leave on “Recommended”. It should now show the Dell virtual disk of 69GB (in this case) or the Dell RAID virtual disk / disk configuration.
b. Say “Yes” to removing all existing partitions on the disk. (That is if you don’t mind formatting and completely clearing out any existing data that may be on this disk).
c. Alter partitions to get the following best practice sizes: (See http://vmetc.com/2008/02/12/best-practices-for-esx-host-partitions/)
d. Note: It doesn’t matter if these sizes are 2-3MB out for some. The installer deviates these sizes slightly. The swap partition should have 1600MB minimum though.
e. Next page is Advanced Options – Leave as is (Book from SCSI drive).

esx_partitions_recommended

10. Network Configuration
a. Setup network configuration
b. IP address (x.x.x.x) – whatever IP you are assigning this particular ESX Host.
c. Subnet mask: 255.255.255.0 for example.
d. Gateway:  Your gateway IP address (x.x.x.x)
e. Primary DNS:  (x.x.x.x)
f. Secondary DNS: (x.x.x.x)
g. Hostname: localhost.localdomain for example : ESXhost01.shogan
h. VLAN ID – Leave this blank if you are not using VLANs. If you are, then specify the VLAN here.
i. Create a default network for virtual machines – Unless you have a specific network configuration in mind leave this ticked on.

11. Time zone
a. Set Location to  your location.
b. System clock uses UTC is left as ticked.

12. Root password
a. Set default root password . (This is your admin password)!

13. Finish installation
a. Next page is “About to Install”
b. Check the information is all correct and click Next if all looks fine.

14. Change boot order back and restart the blade server.
a. Via the iDRAC page, change the boot order back to Hard disk for the blade so that it will reboot using the server’s RAID hard disks instead of the ISO.
b. Reboot the host by pressing the Finish button back in the console.
c. Disconnect the Virtual CD from the Media option in the console menu.
d. Watch the console while the server reboots to ensure no errors are reported on startup.

If all went well, you should now have an ESX Host booted to the console. Press Alt-F1 to access the command line (you will need to login as root or any other user you setup).

You can now access your server via the web browser (https://x.x.x.x). From here you can download the Virtual Infrastructure client to manage the ESX Host with.

This host could now be further configured and added to an ESX cluster for example. SANs could be assigned and vMotion setup so that HA (High Availability) and DRS (Distributed Resource Scheduling) can be put to good use!