HA component protection by an APD

INF-BCO2807 – vSphere HA and Datastore Access Outages tech preview

All of you know by now that I have a love for availability related topics… Hence the reasons I needed to write something about INF-BCO2807. The session titled “vSphere HA and Datastore Access Outages – Current- Capabilities Deep-Dive and Tech Preview”, presented by Keith Farkas and Smriti Desai, discussed possible future HA enhancements that will solve component failures. Those of you who read my whitepaper on stretched clusters can immediately see why this would be a nice enhancement!

Once again a big fat disclaimer, VMware gives absolutely no guarantees when or even if this will be released.

This session was all about inaccessible data stores. During our talk Lee Dilworth and I explained the difference between a Permanent Device Loss (PDL) and an All Paths Down (APD) condition. In short, PDL is a “scsi sense code” issued by the storage system (or an iSCSI “login reject” for that matter). This scsi sense code allows vSphere (both the kernel and HA) to respond and act upon it. In the case of an APD vSphere cannot respond… the LUN is gone on that host and we don’t know why, so what do we do? Well with 5.1 and prior we do nothing. This results in zombied virtual machines, and that is not the state you want your virtual machines to be in right?

So how is VMware planning to solve this? It is planning to enhance HA with what was referred to as “Component Protection”. Component Protection allows responses per virtual machine when an APD or PDL has been detected. This is not based on guest I/Os failing, but on the vSphere platform declaring that the device is in a PDL or APD condition.

When an APD scenario is detected HA will be smart enough to understand which hosts can restart virtual machines, as in some cases multiple hosts might be impacted. Of course it will also only kill your virtual machine and restart it when it knows capacity is available for it.

I don’t know about you, but I would rather see this implemented today than tomorrow!? APD is not common, but also not rare… and when disaster strikes, it strikes hard!

I don’t think this session is scheduled for VMworld Europe, so make sure to watch the recording as soon as it is available as it is well worth your time. Keith and Smriti gave an excellent deepdive on the current vSphere HA and a nice look in to the future!

Source: Yellow-Bricks

Esxi is telling fibs to backup software

Redditor “tottenham12712” has pieced together a scary scenario for VMware users: ESXi might be feeding dud data to backup software.

VMware knows it has a problem. Indeed, it has explained in this knowledge base article that when vAdmins expand a vmdk file past 128GB, and Change Block Tracking (CBT) is enabled, the QueryChangedDiskAreas("*") command returns an incorrect list of virtual machine disk sectors.

 

As tottenham explains, that’s bad news because some backup software relies on that data to determine what to back up.

Tottenham12712 goes on to quote an email he’s received from Veeam containing the following grim news:

“But the main point is that your backups and replicas for all VMs that had its virtual disk size expanded beyond 128 GB at some point may be unrecoverable.”

The good news is that there seems to be a simple fix: turning it off and turning it on again, with the “it” being Change Block Tracking.

VMware’s working on a proper fix and Veeam is working on updates to its software to take the flaw into account. ®

Source: theregister.co.uk