Hostname: ABC-P-F06-SY02.xyz.abc.com
ESXi Version: ESXi 6.5 Update 3
Time of Issue: 11:59 , 27th July 2021
Time in GMT: 5:59 PM, 27th July 2021
VM Name: ABC-p-qv-mq1
- While
Looking at the Vmware logs for the VM we can see that the Snapshot Request
came with Lazy=1
2021-07-27T17:59:57.301Z| vmx| I125:
VigorTransportProcessClientPayload: opID=kq6q6uj9-6inz2-h5:70166868-20-ee-d30c
seq=5261055: Receiving Snapshot.Take request.
2021-07-27T17:59:57.301Z|
vmx| I125: SnapshotVMX_TakeSnapshot start: ‘VM Snapshot 7%252f27%252f2021, 11:59:45
PM’, deviceState=1, lazy=1, quiesced=0, forceNative=0, tryNative=1,
saveAllocMaps=0 cb=C1B6B98A10, cbData=C1B84B3B00
2021-07-27T17:59:57.313Z| vmx| I125: SNAPSHOT:
SnapshotPrepareTakeDoneCB: Prepare phase complete (The operation completed
successfully).
2021-07-27T17:59:57.314Z| vcpu-0| I125: Destroying
virtual dev for scsi0:0 vscsi=8816
2021-07-27T17:59:57.314Z| vcpu-0| I125: Destroying virtual dev for scsi0:1
vscsi=8817
2021-07-27T17:59:57.314Z| vcpu-0| I125: Destroying virtual dev for scsi0:2
vscsi=8818
2021-07-27T17:59:57.443Z| vcpu-0| I125: Progress -1%
(msg.checkpoint.saveStatusWithName)
2021-07-27T17:59:57.443Z| vcpu-0| I125: Checkpointed in VMware ESX, 6.5.0,
build-14320405, Linux Host
2021-07-27T17:59:57.474Z| vcpu-0| I125: Progress 0% (none)
2021-07-27T17:59:57.571Z| vcpu-0| I125: Progress 101% (none)
2021-07-27T17:59:57.571Z| vcpu-0| I125: Closing all the disks of the VM.
2021-07-27T17:59:57.571Z|
vcpu-0| I125: Closing disk ‘scsi0:2’
2021-07-27T17:59:57.572Z|
vcpu-0| I125: Closing disk ‘scsi0:1’
2021-07-27T17:59:57.572Z|
vcpu-0| I125: Closing disk ‘scsi0:0’
- Here we
can see that the VM was stun only for 0.44 Seconds:
2021-07-27T17:59:57.757Z| vcpu-0| I125:
Checkpoint_Unstun: vm stopped for 443003 us
- However
since we have selected the Snapshot with Memory there is a Lazy IO Task
which is initiated and took around 5 Seconds.
.2021-07-27T17:59:58.420Z|
Worker#2| I125: MainMem: Begin lazy IO (16777216 pages, 0 done, 1 threads, bio
= 0).
.
.
2021-07-27T18:04:37.460Z| Worker#2|
I125: MainMem: End lazy IO (16777216 done, sync = 0, error = 0).
2021-07-27T18:04:37.465Z| vmx| I125:
VigorTransport_ServerSendResponse opID=kq6q6uj9-6inz2-h5:70166868-20-ee-d30c
seq=5261055: Completed Snapshot request.
VM Name: ABC-p-qv-mq2
- Another
Operation triggered on 18:05.
2021-07-27T18:05:10.815Z| vmx| I125:
VigorTransportProcessClientPayload: opID=kq6q6uj9-6ir7b-h5:70166955-98-2d-d5da
seq=5236325: Receiving Snapshot.Take request.
2021-07-27T18:05:10.815Z|
vmx| I125: SnapshotVMX_TakeSnapshot start: ‘VM Snapshot 7%252f28%252f2021,
12:05:01 AM’, deviceState=1, lazy=1, quiesced=0, forceNative=0, tryNative=1,
saveAllocMaps=0 cb=38A4790A10, cbData=38A5E8BA00
2021-07-27T18:05:10.827Z| vmx| I125: SNAPSHOT:
SnapshotPrepareTakeDoneCB: Prepare phase complete (The operation completed
successfully).
2021-07-27T18:05:10.828Z| vcpu-0| I125: Destroying virtual dev for scsi0:0
vscsi=8198
2021-07-27T18:05:10.828Z| vcpu-0| I125: Destroying virtual dev for scsi0:1
vscsi=8199
2021-07-27T18:05:10.828Z| vcpu-0| I125: Destroying virtual dev for scsi0:2
vscsi=8200
2021-07-27T18:05:10.959Z| vcpu-0| I125: Progress -1%
(msg.checkpoint.saveStatusWithName)
2021-07-27T18:05:10.982Z| vcpu-0| I125: Progress 0% (none)
2021-07-27T18:05:11.079Z| vcpu-0| I125: Progress 101% (none)
2021-07-27T18:05:11.079Z| vcpu-0| I125: Closing all the disks of the VM.
2021-07-27T18:05:11.079Z| vcpu-0| I125: Closing disk ‘scsi0:2’
2021-07-27T18:05:11.081Z|
vcpu-0| I125: Closing disk ‘scsi0:1’
2021-07-27T18:05:11.081Z|
vcpu-0| I125: Closing disk ‘scsi0:0’
- This
operation stun the VM for same 0.44 seconds:
2021-07-27T18:05:11.274Z| vcpu-0| I125:
Checkpoint_Unstun: vm stopped for 445986 us
- However
since the Snapshot with Memory was selected we can see the Begin and End
Lazy IO Operation took around 4 Seconds.
2021-07-27T18:05:11.982Z| Worker#1|
I125: MainMem: Begin lazy IO (16777216 pages, 0 done, 1 threads, bio = 0).
2021-07-27T18:09:24.836Z| Worker#1| I125: MainMem: End lazy IO (16777216
done, sync = 0, error = 0).
2021-07-27T18:09:24.840Z| vmx| I125: VigorTransport_ServerSendResponse
opID=kq6q6uj9-6ir7b-h5:70166955-98-2d-d5da seq=5236325: Completed Snapshot
request.
VM Name: ABC-p-qv-mq3
- Sane kind
of scenario happened with Mq3 as well, the Stun Time took around 0.44
Seconds and the Memory copy took around 4 Seconds.
2021-07-27T18:11:09.234Z| vmx| I125:
VigorTransportProcessClientPayload: opID=kq6q6uj9-6iuc5-h5:70166996-67-53-d8ab
seq=5206468: Receiving Snapshot.Take request.
2021-07-27T18:11:09.235Z|
vmx| I125: SnapshotVMX_TakeSnapshot start: ‘VM Snapshot 7%252f28%252f2021,
12:10:59 AM’, deviceState=1, lazy=1, quiesced=0, forceNative=0, tryNative=1,
saveAllocMaps=0 cb=B6870B7A10, cbData=B68865F9A0
2021-07-27T18:11:09.245Z| vmx| I125: SNAPSHOT:
SnapshotPrepareTakeDoneCB: Prepare phase complete (The operation completed
successfully).
2021-07-27T18:11:09.246Z| vcpu-0| I125: Destroying virtual dev for scsi0:0
vscsi=8207
2021-07-27T18:11:09.246Z| vcpu-0| I125: Destroying virtual dev for scsi0:1
vscsi=8208
2021-07-27T18:11:09.246Z| vcpu-0| I125: Destroying virtual dev for scsi0:2
vscsi=8209
2021-07-27T18:11:09.378Z| vcpu-0| I125: Progress -1%
(msg.checkpoint.saveStatusWithName)
2021-07-27T18:11:09.406Z| vcpu-0| I125: Progress 0% (none)
2021-07-27T18:11:09.503Z| vcpu-0| I125: Progress 101% (none)
2021-07-27T18:11:09.503Z| vcpu-0| I125: Closing all the disks of the VM.
2021-07-27T18:11:09.503Z| vcpu-0| I125: Closing disk ‘scsi0:2’
2021-07-27T18:11:09.504Z|
vcpu-0| I125: Closing disk ‘scsi0:1’
2021-07-27T18:11:09.505Z|
vcpu-0| I125: Closing disk ‘scsi0:0’
2021-07-27T18:11:09.702Z|
vcpu-0| I125: Checkpoint_Unstun: vm stopped for 456295 us
2021-07-27T18:11:10.430Z| Worker#0| I125: MainMem: Begin lazy IO
(16777216 pages, 0 done, 1 threads, bio = 0).
2021-07-27T18:15:17.642Z| Worker#0| I125: MainMem: End lazy IO (16777216
done, sync = 0, error = 0).
2021-07-27T18:15:17.647Z| vmx| I125: VigorTransport_ServerSendResponse
opID=kq6q6uj9-6iuc5-h5:70166996-67-53-d8ab seq=5206468: Completed Snapshot
request.
Conclusion:
- Generally,
while performing a Change its always recommended to have healthy backup or
a Snapshot of the Virtual machine so that Incase if there are any issue
with the process we can always revert back to the Healthy state.
- Based on
the logs we can see that the Snapshot Operation stun the Virtual Machine
for 0.44 Seconds which should not be much of a concern. However, the Since
we have selected Memory Copy, it took nearly 4 Seconds of its own which
can cause issues with a IO Sensitive Virtual Machine.
Action Plan:
- I will
recommend you check if it’s possible to take Offline Snapshots for this IO
Sensitive Virtual Machines as it will release all the Pending IO from
Memory and make the Snapshot operation more consistent.
- Incase if
it’s not possible then, I will recommend you to take a snapshot without
Memory as these snapshot required the Memory content to be copied which
might affect a IO Sensitive workload.