VM Component Protection (VMCP) – What is it?

1

I was recently working in my VMware 6.0 lab environment and I came across a very useful feature -- VM Component Protection (VMCP).

This is a new option available through the vSphere HA settings on a cluster that protects VMs from Permanent Device Loss (PDL) and All Paths Down (APD) situations.

VM Component Protection (VMCP) will protect VMs storage connectivity failures and misconfigurations.

Before we move any further, first let us understand what is Permanent Device Loss (PDL) and All Paths Down (APD)?

Permanent Device Loss

A Permanent Device Loss situation occurs when an LUN presented to the ESXi is unavailable. When you go to Storage adapters view, you would see the device reporting as Lost Communication.

You must be wondering what are the causes for a PDL? Well, I have listed some below:

  • Array misconfiguration.
  • Removing the ESXi host from the array’s storage group.
  • An LUN failure on the storage array.
  • Incorrect zoning configuration that can cause the LUN to be unavailable.

When an unplanned PDL occurs, the host will stop sending I/O requests to the Storage array even though the paths are up and accessible.

The way it decides to stop sending I/Os is with the help of SCSI sense codes that are received from the Storage array using the paths indicating that the device is unavailable.

All Paths Down (APD)

I personally do not like this condition because the ESXi host is left in a state where it does not know if the device is permanently unavailable or not.

This condition occurs when the storage target becomes suddenly unavailable to the ESXi host without any prior notification.

The various causes for an APD situation are:

  • FC Switch failures.
  • Momentary network outage.
  • FC HBA failures.

Since there is a chance that the device might be available again, the ESXi host continues retrying I/O operations for the next 140 seconds known as APD Timeout period and then marks the device as APD.

Now that we are aware of the terminologies and the causes for these issues, let us jump into the meat of the article.

How to enable VM Component Protection (VMCP)?

You will have to log in to the Web Client to enable the VM Component Protection setting on the cluster.

Edit the Cluster Settings, and enable the checkbox that says Protect against Storage Connectivity Loss.

VM Component Protection

As shown in the above figure, there are the following options with parameters:

  • Response for Datastore with Permanent Device Loss (PDL)
    -- Disabled – No action will be taken against the affected VMs.
    -- Issue events – No action will be taken against the affected VMs, only an event when a PDL has occurred.
    -- Power off and restart VMs as would HA normally do.
  • Response for Datastore with All Paths Down (APD)
    --
    Disabled  – No action will be taken against the affected VMs.
    -- Issue events – No action will be taken against the affected VMs, only an event when an APD  has occurred.
    -- Power off and restart VMs (conservative) – Restart only if there is sufficient capacity on healthy hosts.
    -- Power off and restart VMs (aggressive) – Does not perform any checks for resources, attempts to restart the affected VMs. This setting might not restart all VMs if there are no resources available on the other hosts in the cluster.
  • Delay for VM failover for APD -- When the APD Timeout has been reached (default: 140 seconds) VMCP will wait an additional period of time (3 minutes) before taking action against the affected VMs. 3 minutes is the default. You can decrease and increase the value based on cluster requirements.
  • Response for APD recovery after APD timeout -- This setting will instruct vSphere HA to take a certain action if an APD event is cleared after the APD timeout was reached but before the Delay for VM failover has been reached.
    -- Disabled – No action will be taken against the affected VMs.
    -- Reset VMs – Hard reset of the VMs.

VM Component Protection

Below is the image that shows the VM Component Protection (VMCP) Recovery Timeline:

VM Component Protection (VMCP)

Image Credit: VMware

This is a very handy feature that I have discovered and will be recommending to customers that aren’t aware of it. This will definitely reduce the downtime of VMs that are affected by storage problems.

I hope this has been informative and thank you for reading!

Share.

About Author

I am Adil Arif, working as a Senior Technical Support Engineer at Rubrik as well as an independent blogger and founder of Enterprise Daddy. In my current role, I am supporting infrastructure related to Windows and VMware datacenters.

1 Comment

Leave A Reply