Introduction
vSphere 8 has set a new standard for lifecycle and maintenance management in vSphere infrastructure i.e. Partial Maintenance Mode. With the introduction of vSphere Lifecycle Manager, cluster image management is now streamlined, offering comprehensive stack driver and firmware lifecycle management. This expedites total cluster remediation. Moreover, certificate management is non-disruptive, and vCenter updates can now be executed with significantly reduced downtime.
Maintenance mode is a well-established concept in vSphere. It has well-defined semantics – during maintenance mode there are no running VMs on the host and many operations are not allowed (long-running VM operations, NFC file transfers). Other operations are allowed only during maintenance mode – firmware updates, some patch manager operations, etc, since we can assume disruptive operations are safe in this mode. Usually, a host in maintenance mode will eventually be rebooted.
However, these restrictions require that powered-on VMs be migrated or otherwise dealt with (powered off, suspended to memory or disk), so maintenance mode is quite disruptive. We now have scenarios where only a subset of these restrictions are required to service a host. Also, we have scenarios when additional restrictions are required. Instead of changing the maintenance mode, which has well-defined semantics and guarantees, we will introduce the concept of partial maintenance modes.
What are partial maintenance mode(s)
A partial maintenance mode is a unique state designed to bring down a specific service for upgrade or removal safely. From the hostd perspective, this mode is activated when certain conditions are met, restricting specific operations upon entering. Unlike full maintenance mode, which has stringent requirements, partial maintenance mode offers more flexibility. The goal is to support various partial maintenance modes, each with its own set of rules prohibiting different operations as needed.
Requirements for vSphere Live Patch
To leverage vSphere Live Patch, certain prerequisites must be met:
- vCenter Version: Must be 8.0 Update 3 or later.
- ESXi Hosts Version: Must be 8.0 Update 3 or later.
- Live Patch Settings: The Enforce Live Patch setting must be enabled in the global vSphere Lifecycle Manager remediation settings or the cluster remediation settings.
- DRS: Must be enabled on the vSphere cluster and in fully automated mode.
- vGPU Enabled VMs: Enable Passthrough VM DRS Automation.
- Patch Eligibility: The current build of the vSphere cluster must be eligible for a live patch.
How Live Patching Works
- Partial Maintenance Mode: The ESXi host enters partial maintenance mode, allowing existing VMs to continue running but disallowing the creation of new VMs or migration of VMs to or from the host.
- Patch Mounting: A new revision of the target patch components is mounted in parallel to the current version.
- Patching: The new mount revision files and processes are patched.
- Fast-Suspend-Resume(FSR): Virtual machines undergo a fast-suspend-resume to consume the patched revision.
Patch Compatibility
vSphere Live Patch is initially available for patches targeting the virtual machine execution component of ESXi. Patches affecting other areas, like VMkernel patches, are not initially supported and will follow the existing patching workflow requiring maintenance mode and VM evacuation.
Limitations
vSphere Live Patch is not compatible with systems configured with TPM devices or systems configured with DPUs using the vSphere Distributed Services Engine. After successful remediation, hosts running VMs incompatible with FSR will still report being out of compliance until manual remediation is performed.