Time of Issue: 1.6.2021 10:51 AM IST And 11:38
IST
Time in GMT:1.6.2021 5:21 AM and 6:08 AM GMT
Datastore Name |
naa / t10 / eui ID |
Volume UUID |
high-prstrg0014_db02_14 |
naa.624a937026b560ffb1cd487e00011460 |
60b5c460-617cb367-4737-0090fa7c8dca |
high-prstrg0011_db02_03 |
naa.624a93701b816ef1cff2467f00011046 |
5f92c095-3c857356-6000-0090faa2049a |
high-prstrg0014_db02_11 |
naa.624a937026b560ffb1cd487e0001144a |
5ef0d215-6e539201-b768-0090faf18798 |
high-prstrg0014_db02_02 |
naa.624a937026b560ffb1cd487e000113f3 |
5f88f3b8-2a12034e-1a14-0090faa2049a |
high-prstrg005_db02_04 |
naa.624a9370d79f4b7c182440e700011042 |
5f8d8bc2-5a38a544-7b58-0090faa2049a |
VM Name: ds-abc202-104z
- From the
Vmware logs we can see that the Vmware Tools lost the Heartbeat to the
Virtual Machine. Since the VM Monitoring was enabled, the Virtual Machine
was Terminated and brought back online.
2021-06-01T05:17:39.278Z| vmx| I125:
VigorTransport_ServerSendResponse opID=7fa53267-01-02-79-2d0a seq=1469947:
Completed Sched request.
2021-06-01T05:17:57.093Z|
vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2021-06-01T05:18:05.071Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2021-06-01T05:18:05.071Z|
vcpu-0| I125: Tools: Running status rpc handler: 1 => 0.
2021-06-01T05:18:05.071Z|
vcpu-0| I125: Tools: Changing running status: 1 => 0.
2021-06-01T05:18:17.097Z|
vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2021-06-01T05:18:17.097Z|
vmx| I125: GuestRpc: app toolbox’s second ping timeout; assuming app is down
2021-06-01T05:18:17.097Z|
vmx| I125: GuestRpc: Reinitializing Channel 0(toolbox)
2021-06-01T05:18:17.097Z|
vmx| I125: GuestMsg: Channel 0, Cannot unpost because the previous post is
already completed
2021-06-01T05:21:12.819Z|
vmx| I125: VigorTransportProcessClientPayload: opID=SWI-537d5f48-31eb
seq=1470130: Receiving MKS.RefreshScreenshot request.
2021-06-01T05:21:12.819Z|
svga| I125: MKSScreenShotMgr: Taking a screenshot
2021-06-01T05:21:12.883Z|
svga| I125: VigorTransport_ServerSendResponse opID=SWI-537d5f48-31eb
seq=1470130: Completed MKSVMX request.
2021-06-01T05:21:12.916Z|
vmx| I125: VigorTransportProcessClientPayload:
opID=lro-848-58631ed5-SWI-673be834-31f7 seq=1470144: Receiving
Sched.SetResourceGroup request.
2021-06-01T05:21:12.916Z|
vmx| I125: VigorTransport_ServerSendResponse
opID=lro-848-58631ed5-SWI-673be834-31f7 seq=1470144: Completed Sched request.
2021-06-01T05:21:12.916Z|
vmx| I125: VigorTransportProcessClientPayload:
opID=lro-848-58631ed5-SWI-673be834-31f7 seq=1470145: Receiving
PowerState.InitiateReset request.
2021-06-01T05:21:12.916Z|
vmx| I125: Vix: [7883256 vmxCommands.c:686]: VMAutomation_Reset. Trying hard
reset
- VM was
Powered On:
2021-06-01T05:21:53.902Z| vmx| W115: PowerOn
- Post that
VM again lost the Heartbeat to the Virtual Machine and the System Crashed
the VM to bring it back online.
2021-06-01T06:02:56.928Z| vcpu-0| I125: Tools: Tools
heartbeat timeout.
2021-06-01T06:02:56.928Z|
vcpu-0| I125: Tools: Running status rpc handler: 1 => 0.
2021-06-01T06:02:56.928Z|
vcpu-0| I125: Tools: Changing running status: 1 => 0.
2021-06-01T06:03:01.228Z|
vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2021-06-01T06:04:33.302Z|
vcpu-0| I125: GuestMsg: Cannot close channel 0: it is not opened
2021-06-01T06:04:33.339Z|
vcpu-0| I125: Tools: Running status rpc handler: 0 => 1.
2021-06-01T06:04:33.339Z|
vcpu-0| I125: Tools: Changing running status: 0 => 1.
2021-06-01T06:04:35.577Z|
vcpu-0| I125: Guest: *** WARNING: GuestInfo collection interval longer than
expected; actual=136 sec, expected=30 sec. ***
2021-06-01T06:04:35.577Z|
vcpu-0| I125+
2021-06-01T06:04:37.163Z|
vcpu-0| W115: GuestRpc: application toolbox, changing channel 65535 -> 0
2021-06-01T06:04:37.163Z|
vcpu-0| I125: GuestRpc: Channel 0, guest application toolbox.
2021-06-01T06:04:37.461Z|
vcpu-0| I125: TOOLS autoupgrade protocol version 2
2021-06-01T06:04:37.461Z|
vcpu-0| I125: Vix: [8354420 mainDispatch.c:4151]:
VMAutomationReportPowerStateChange: Reporting power state change (opcode=2,
err=0).
2021-06-01T06:04:37.462Z|
vcpu-0| I125: TOOLS Received tools.set.version rpc call, version = 10346,
setting type to 2 from guest OS
2021-06-01T06:04:37.462Z|
vcpu-0| I125: Tools_SetVersionAndType did nothing; new tools version (10346)
and type (2) match old Tools version and type
2021-06-01T06:04:57.167Z|
vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2021-06-01T06:05:00.762Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2021-06-01T06:05:00.762Z|
vcpu-0| I125: Tools: Running status rpc handler: 1 => 0.
2021-06-01T06:05:00.762Z|
vcpu-0| I125: Tools: Changing running status: 1 => 0.
2021-06-01T06:05:17.170Z|
vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2021-06-01T06:05:17.170Z|
vmx| I125: GuestRpc: app toolbox’s second ping timeout; assuming app is down
2021-06-01T06:05:17.170Z|
vmx| I125: GuestRpc: Reinitializing Channel 0(toolbox)
2021-06-01T06:08:07.652Z|
vmx| I125: VigorTransportProcessClientPayload:
opID=lro-853-57d52fb1-SWI-53061dc0-6c8f seq=1473022: Receiving
Sched.SetResourceGroup request.
2021-06-01T06:08:07.652Z|
vmx| I125: VigorTransport_ServerSendResponse
opID=lro-853-57d52fb1-SWI-53061dc0-6c8f seq=1473022: Completed Sched request.
2021-06-01T06:08:07.652Z|
vmx| I125: VigorTransportProcessClientPayload:
opID=lro-853-57d52fb1-SWI-53061dc0-6c8f seq=1473023: Receiving
PowerState.InitiateReset request.
2021-06-01T06:08:07.652Z|
vmx| I125: Vix: [7883256 vmxCommands.c:686]: VMAutomation_Reset. Trying hard
reset
Hostname: xyz203-12.prod.abc.local
Uptime is 442.0 days,
ESXi Version: ESXi 6.5 Update 3
VOBD Logs:
- Found no
events related to the Datastore where the VM’s VMDK were residing.
FDM Logs:
- From the
FDM Logs we can see the same set of events, the Guest OS Stopped
responding to the Heartbeats and the Status turned from Green to Red.
- After waiting for some time when the Recheck timeout completed, the
Virtual machine was restarted by the HA.
2021-06-01T05:17:59.790Z verbose fdm[5EC8B70]
[Originator@6876 sub=Policy opID=SWI-7e51a8f]
[LocalVmMonitoringPolicy::ProcessElements(VmHeartbeatStateChange)] VM
/vmfs/volumes/5ef0d215-6e539201-b768-0090faf18798/ds-abc202-104z.prod.abc.local/ds-abc202-104z.prod.abc.local.vmx:
guest OS heartbeat transitioned to yellow
2021-06-01T05:18:02.011Z
verbose fdm[5BFDB70] [Originator@6876 sub=Monitor]
[IsoAddressMonitor::UpdateAddressList] Isolation addresses have changed
2021-06-01T05:18:05.075Z
verbose fdm[5BFDB70] [Originator@6876 sub=Invt opID=SWI-5232a655]
[VmHeartbeatStateChange::SaveToInventory] vm
/vmfs/volumes/5ef0d215-6e539201-b768-0090faf18798/ds-abc202-104z.prod.abc.local/ds-abc202-104z.prod.abc.local.vmx
changed toolsStatus=toolsNotRunning
2021-06-01T05:18:27.492Z
verbose fdm[5E46B70] [Originator@6876 sub=Election opID=SWI-60b7acd9]
CheckVersion: Version[1] Other host GT : 987 > 986
2021-06-01T05:18:27.497Z
verbose fdm[5C7FB70] [Originator@6876 sub=Policy opID=SWI-34866f39]
[ClusterConfigCache::ParseVmMonitoringConfiguration] VM guest OS monitoring is
enabled
2021-06-01T05:18:27.497Z
verbose fdm[5C7FB70] [Originator@6876 sub=Policy opID=SWI-34866f39]
[ClusterConfigCache::ParseVmMonitoringConfiguration] VM application monitoring
is disabled
2021-06-01T05:18:27.497Z
info fdm[5C7FB70] [Originator@6876 sub=Policy opID=SWI-34866f39]
[LocalVmMonitoringPolicy::EnableMonitoring] Initializing VM monitoring …
2021-06-01T05:18:29.792Z
verbose fdm[5EC8B70] [Originator@6876 sub=Invt opID=SWI-65f0221f]
[VmHeartbeatStateChange::SaveToInventory] vm
/vmfs/volumes/5ef0d215-6e539201-b768-0090faf18798/ds-abc202-104z.prod.abc.local/ds-abc202-104z.prod.abc.local.vmx
changed guestHB=red
2021-06-01T05:18:29.793Z
verbose fdm[5CC0B70] [Originator@6876 sub=Policy opID=SWI-78505581]
[LocalVmMonitoringPolicy::ProcessElements(VmHeartbeatStateChange)] VM
/vmfs/volumes/5ef0d215-6e539201-b768-0090faf18798/ds-abc202-104z.prod.abc.local/ds-abc202-104z.prod.abc.local.vmx:
guest heartbeat transitioned to red
2021-06-01T05:19:10.809Z
verbose fdm[583BB70] [Originator@6876 sub=Policy]
[VmPendingWorkVisitor::Visit(MonitorHeartbeatState)] red heartbeat detected.
Need to recheck
2021-06-01T05:19:13.810Z
verbose fdm[5D83B70] [Originator@6876 sub=Policy] [VmPendingWorkVisitor::Visit(MonitorHeartbeatState)]
red heartbeat detected. Need to recheck
2021-06-01T05:19:16.812Z
verbose fdm[5DC4B70] [Originator@6876 sub=Policy]
[VmPendingWorkVisitor::Visit(MonitorHeartbeatState)] red heartbeat detected.
Need to recheck
2021-06-01T05:19:29.797Z
verbose fdm[5BFDB70] [Originator@6876 sub=Policy opID=SWI-54e907a3]
[LocalVmMonitoringPolicy::GetQSHeartbeatStatus] Guest heartbeat is lost (red).
VM:/vmfs/volumes/5ef0d215-6e539201-b768-0090faf18798/ds-abc202-104z.prod.abc.local/ds-abc202-104z.prod.abc.local.vmx
2021-06-01T05:19:31.820Z
verbose fdm[5BFDB70] [Originator@6876 sub=Policy]
[VmPendingWorkVisitor::Visit(MonitorHeartbeatState)] red heartbeat detected.
Need to recheck
2021-06-01T05:19:34.822Z
verbose fdm[583BB70] [Originator@6876 sub=Policy]
[VmPendingWorkVisitor::Visit(MonitorHeartbeatState)] red heartbeat detected.
Need to recheck
2021-06-01T05:20:52.859Z verbose fdm[5C7FB70]
[Originator@6876 sub=Policy]
[MonitorHeartbeatState::MoveToResetStateIfTimersExpired] VM
/vmfs/volumes/5ef0d215-6e539201-b768-0090faf18798/ds-abc202-104z.prod.abc.local/ds-abc202-104z.prod.abc.local.vmx
is going to reset state because failure timer expired. Reset no: 1 out of Max
allowed reset count: 3
2021-06-01T05:21:12.915Z info fdm[5E87B70]
[Originator@6876 sub=vpxLro opID=lro-848-58631ed5-01] [VpxLRO] — BEGIN lro-849
— — VimOperationQ: VmReset —
vCenter Server: avcenter504z.prod.abc.local
VPXD Logs:
- Unable to
Find any ESXi Host disconnection issues from the VPXD Logs for the ESXi
Host.
Conclusion:
- Based on
the logs we can conclude that the issue happened because of the ESXi Host
Lost the Heartbeats to the Virtual Machine and Since VM Monitoring was
configured as per the process the Virtual machine was Restarted.
For More details on VM
Monitoring: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.avail.doc/GUID-62B80D7A-C764-40CB-AE59-752DA6AD78E7.html
- This
issue generally happens in the Scenario where the Virtual Machine has
either Crashed with a BSOD or went into a Hung State. In these scenarios
the ESXi Host will not be able to get the Heartbeat messages and considers
the machine to be Dead.
Action Plan
- As per
the Article: https://kb.vmware.com/s/article/1027734?lang=en_US
- This is
the issue which is triggered from inside the GuestOS. I will recommend you
to check with the OS Vendor for further investigation.
- Incase if
this is an issue of VM Getting Hung and if you require a Hang Dump for the
Virtual Machine, I will recommend you to disable the VM Monitoring and
wait for the machine to get hung again. Post that you can capture the vmem
and vmss files to create the Dump file for the Virtual machine using :
- For more
details, please refer to: https://kb.vmware.com/s/article/2003941