Have you ever asked yourself what happens when the USB device / SD card where you have installed ESXi fails?

The good news is: nothing (in the short run)

ESXi is loaded into memory while booting – so there is no impact on ESXi when the device fails.

All virtual machines will continue to run as usual, all services/features you know and love (eg. vMotion,…) are still available, too.

Another piece of good news is that this case is monitored by ESXi and you will receive a notification in your vCenter:

„Lost connectivity to the device … backing the boot filesystem. As a result, host configuration changes will not be saved to persistent storage.“

Of course, if you try to reboot the ESXi host now, the reboot would fail as the host has no working boot device!

What you can do, is to put the ESXi host in maintenance mode and shut it down. Replace the broken SD card/USB device and install/configure the ESXi host new.

By the way, a hardware monitoring tool like HP SIM can also report faulty internal SD cards. You will get errors like „Embedded Flash/SD-CARD: failed restart

While working on the vSphere 6 ESXTOP quick Overview for Troubleshooting poster, I came across the VAAI counters that are available with ESXTOP.

Finally I did not include them to the poster, because I doubt that they are relevant for troubleshooting in most cases.

But maybe someone finds this information useful – so here is the information summarized in a short blog post:

To display the VAAI stats open ESXTOP and press „u“ to show the disk devices.

Now press „f“ to add or remove fields. In our case we want to display only the fields A (Device Name) and O (VAAI Stats):

VAAI_fields

ESXTOP will now show you the following VAAI stats:

VAAI_ESXTOP

  • CLONE_RD – number of successfully completed CLONE commands where disk device was a source
  • CLONE_WR – number of successfully completed CLONE commands where disk device was a destination
  • CLONE_F – number of failed CLONE commands
  • MBC_RD/s – CLONE data read per second (in MB)
  • MBC_WR/s – CLONE data written per second (in MB)

Note:
CLONE_F  shows failed VAAI commands and should be zero. CLONE_F > 0 implies that there is/was a constraint on the array. In this case ESX will fail back to a software copy.

  • ATS – number of successfully completed ATS (Atomic Test & Set) commands
  • ATSF – number of failed ATS commands

Note:
If you see ATSF values > 0 it is not a problem in the short run. It only means that ESX falls back to SCSI-2 reservations for metadata updates for any reason. If you see SCSI reservation conflicts in a high number in the /var/log/vmkernel log you should give them a look.

  • ZERO – number of successfully completed ZERO commands
  • ZERO_F – number of failed ZERO commands
  • MBZERO/s – data zeroed per second (in MB)
  • DELETE – number of sucessfully completed DELETE commands
  • DELETE_F – number of failed DELETE commands
  • MBDEL/s – data deleted per second (in MB)

You are looking for more information about ESXTOP counters? You are looking for a description of the ESXTOP counters and thresholds?

Take a look at the ESXTOP Troubleshooting poster for vSphere 6. Or  better – print it and hang it on your office wall:

ESXTOP vSphere 6

 

If you are using Hitachi HAM (High Availability Manager) with HUS VM I would like to ask you to read this.

The post is about an issue/impact we have noticed and that we were able to reproduce in a lab environment.

So I would be interested if you as HAM/HUS VM customer have already noticed this issue, too and – if yes – can provide me feedback (eg. via Twitter @lessi001 or in the comments).

Interested? Here are the details:

Issue description/impacts:

If you are using HDS HAM with HUS VM, there are tasks where you have to do storage path manipulation using a HDS tool called „HGLM“ (Hitachi Global Link Manager), eg. to perform a failback after a storage failover or in a planned failover/failback scenario.

After manipulating a higher number of paths (eg. setting more than 50 paths per host to off or active) we notice the following impacts:

  • ESXi host becomes unresponsive
  • performance logs are not longer recorded
  • up to 100 percent CPU load on ESXi host

These impacts can last from one minute up to three hours – depending on the number of paths, load of the ESXi host and likely the condition of the storage system.

HDS_case

Cause of this behaviour:

HDS HGLM is using the „esxcli storage core path set“ command for the path manipulation.  Unfortunately, this command triggers a datastore rescan every time when executed.

The rescan is necessary as the command changes the topology of the host storage, and the rescan brings the storage data in hostd/vCenter up to date (info from VMware support/engineering)

So if you manipulate eg. 300 paths, the command triggers 300 storage rescans, too!

And this leads to the impacts as described above, as the hostd cannot deal with this high rate.

In our reproduction scenario we tried the same path manipulation with the „localcli“ command. This command does not trigger a rescan in hostd and that’s why the issue is not exposed by „localcli“.

I need your feedback:

If you are a HAM/HUS VM customer and have already noticed these impacts – please get in touch with me (Twitter @lessi001 or via the comments).

If you are a HAM/HUS VM customer and want to know if you have the same issue, you can try to reproduce it in your lab environment. Just catch up with me and I can tell you what we’ve tried.

You want to go to VMworld 2015 US or Europe, but your company/boss does not want to pay for the conference pass?

Then try your luck and win your pass to the show in one of the many sweepstakes.

In this post I try to put together a list with possibilities where and how you can win tickets for VMworld 2015 US and Europe. If one is missing – leave a comment or write me a DM via Twitter @lessi001.

 

Infinio_VMworld_2015Infinio

Infinio gives away a full conference pass for VMworld 2015 US. Follow the link and complete the form – the winner will be announced on Friday, July 17th.

 

 

vmturbo_winVMTurbo VMworld 2015 Sweepstakes

VMTurbo raffles full conference passes – there are three drawings: May 29,  June 19  and July 10. Register here to win!

 

 

evorail_logoEVO:RAIL Hands-on Lab Challenge

Between May 1, 2015 and August 20, 2015 you can win a full conference pass in the EVO:RAIL Hands-on Lab Challenge:

Each month, VMware will announce the winners:

The three best scores (minimum time to complete the lab) will receive a 2015 full conference pass to their choice of VMworld San Francisco or VMworld Barcelona! Interested?

Try your luck here: EVO:RAIL Hands-on Lab Challenge

BusinessMobilityBusiness Mobility Summit (16. june 2015)

Join the Business Moblility Summit and win a VMworld pass from CloudCredibility.com.

Read more about the event and register here: VMware Business Mobility Summit