Host on which
Vm was running before crash : abcde01203.svr.emea.xyze.net
Host on which
Vm was running after crash : abcde01188.svr.emea.xyze.net
Time of reboot of VM: 12.26 UTC Approx.
VM Name: abcd00008134
- From the VM we are not able to see much:
2019-10-07T12:26:28.421Z| vmx| I120: Hostname=abcde01188.emea.ad.xyze.com
2019-10-07T12:26:28.478Z|
vmx| I120: /vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx:
Setup symlink /var/run/vmware/268c3b0ccf27a6da293d986a40960bea ->
/var/run/vmware/root_0/1570451188421971_82029353
2019-10-07T12:26:28.486Z| vmx| I120: Vix: [82029353 mainDispatch.c:3964]:
VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1
additionalError=0
2019-10-07T12:26:28.486Z| vmx| I120: Transitioned vmx/execState/val to
poweredOff
2019-10-07T12:26:28.486Z| vmx| I120: Vix: [82029353 mainDispatch.c:3964]:
VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1
additionalError=0
2019-10-07T12:26:28.486Z| vmx| I120: Vix: [82029353 mainDispatch.c:3964]:
VMAutomation_ReportPowerOpFinished: statevar=2, newAppState=1877, success=1
additionalError=0
2019-10-07T12:26:28.486Z| vmx| I120: Vix: [82029353 mainDispatch.c:3964]:
VMAutomation_ReportPowerOpFinished: statevar=3, newAppState=1881, success=1
additionalError=0
2019-10-07T12:26:28.517Z| vmx| W110: ObtainHardwareID unexpected failure: 22.
2019-10-07T12:26:28.517Z| vmx| W110: Hostinfo_MachineID ObtainHardwareID
failure (Invalid argument); providing default.
2019-10-07T12:26:28.698Z| vmx| I120: Vix: [82029353 mainDispatch.c:3964]:
VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1
additionalError=0
2019-10-07T12:26:28.698Z| vmx| I120: Vix: [82029353 mainDispatch.c:3964]:
VMAutomation_ReportPowerOpFinished: statevar=2, newAppState=1878, success=1
additionalError=0
2019-10-07T12:26:28.925Z| vmx| I120: Vix: [82029353 mainDispatch.c:3964]:
VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1871, success=1
additionalError=0
2019-10-07T12:26:28.929Z| vmx| I120: DICT
toolScripts.afterResume = TRUE
2019-10-07T12:26:28.929Z| vmx| I120: DICT uuid.bios = 42 29 72 58 fc 94
ae 4d-46 2f a2 2e a9 73 53 15
2019-10-07T12:26:28.929Z| vmx| I120: DICT vc.uuid = 50 29 86 64 c9 59
96 7d-7c 2f 0f b5 6f e6 46 4a
2019-10-07T12:26:28.930Z| vmx| I120: DICT uuid.location = 56 4d 5e e0 b9 40
34 08-43 ab 08 65 17 f8 41 22
2019-10-07T12:26:28.936Z| vmx| I120: Vix: [82029353 mainDispatch.c:3964]:
VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1
additionalError=0
2019-10-07T12:26:28.966Z| vmx| I120:
External-interrupt exiting
{0,1}
2019-10-07T12:26:28.966Z| vmx| I120:
Process posted interrupts
{0,1}
2019-10-07T12:26:28.966Z| vmx| I120:
Interrupt-window exiting
{0,1}
2019-10-07T12:26:28.966Z| vmx| I120:
Virtual-interrupt delivery
{0,1}
2019-10-07T12:26:28.966Z| vmx| I120:
Acknowledge interrupt on exit
{0,1}
2019-10-07T12:26:28.970Z| vmx| I120: MONITOR MODE: guestOS preferred modes:
HWMMU HV BT32
2019-10-07T12:26:29.098Z| vmx| I120: UUID: Writing uuid.location value: ’56 4d
7a 98 25 a0 56 8c-ed 21 7e 75 63 25 a2 b9′
2019-10-07T12:26:29.098Z| vmx| A115: ConfigDB: Setting uuid.location = “56
4d 7a 98 25 a0 56 8c-ed 21 7e 75 63 25 a2 b9”
2019-10-07T12:26:29.220Z| Worker#1| I120: DISKLIB-LINK : Opened
‘/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134_1.vmdk’
(0xa): vmfs, 2013265920 sectors / 960 GB.
2019-10-07T12:26:29.222Z| Worker#0| I120: DISKLIB-LINK : Opened
‘/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmdk’
(0xa): vmfs, 167772160 sectors / 80 GB.
2019-10-07T12:26:29.256Z| vmx| I120: DumpDiskInfo: scsi0:0 createType=11,
capacity = 167772160, numLinks = 1, allocationType = 2
2019-10-07T12:26:29.262Z| vmx| I120: DumpDiskInfo: scsi1:0 createType=11,
capacity = 2013265920, numLinks = 1, allocationType = 2
Hostname: abcde01203.emea.ad.xyze.com
VMware ESXi 5.5.0 build-6480324 ( ESXi 5.5
Express Patch 11 )
VMK Summary Logs:
2019-10-07T12:32:15Z bootstop: Host has booted
VOBD Logs:
- Reviewed the VOBD Logs and
can see the Linkstate change which is probably the outcome of the Host
Reboot.
2019-10-07T12:30:57.321Z: [netCorrelator] 56984901us:
[vob.net.vmnic.linkstate.down] vmnic vmnic2 linkstate down
2019-10-07T12:30:57.927Z: [netCorrelator] 57590171us:
[vob.net.vmnic.linkstate.down] vmnic vmnic3 linkstate down
2019-10-07T12:30:58.002Z: [netCorrelator] 57665718us:
[esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic2 linkstate is down
2019-10-07T12:30:58.002Z: An event (esx.problem.net.vmnic.linkstate.down) could
not be sent immediately to hostd; queueing for retry.
2019-10-07T12:30:58.002Z: [netCorrelator] 57665944us:
[esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic3 linkstate is down
2019-10-07T12:30:58.002Z: An event (esx.problem.net.vmnic.linkstate.down) could
not be sent immediately to hostd; queueing for retry.
2019-10-07T12:32:15.324Z:
[UserLevelCorrelator] 134987297us: [vob.user.host.boot] Host has booted.
2019-10-07T12:32:15.324Z:
[GenericCorrelator] 134987297us: [vob.user.host.boot] Host has booted.
2019-10-07T12:32:15.324Z:
[UserLevelCorrelator] 134987720us: [esx.audit.host.boot] Host has booted.
VMKernel Logs:
- From the Vmkernel as well, we are not able to isolate anything
specific, other than the Boot events.
2019-10-07T11:08:23.531Z cpu36:33806)NMP:
nmp_ResetDeviceLogThrottling:3311: Error status H:0x0 D:0x2 P:0x0 Sense Data:
0x5 0x20 0x0 from dev “naa.600508b1001c1584d73e9e8d8b8953a7” occurred
1 times(of 0 commands)
2019-10-07T11:38:23.614Z
cpu36:33806)NMP: nmp_ResetDeviceLogThrottling:3311: Error status H:0x0 D:0x2
P:0x0 Sense Data: 0x5 0x20 0x0 from dev
“naa.600508b1001c1584d73e9e8d8b8953a7” occurred 1 times(of 0
commands)
2019-10-07T12:08:23.697Z
cpu64:33806)NMP: nmp_ResetDeviceLogThrottling:3311: Error status H:0x0 D:0x2
P:0x0 Sense Data: 0x5 0x20 0x0 from dev
“naa.600508b1001c1584d73e9e8d8b8953a7” occurred 1 times(of 0
commands)
TSC: 186852 cpu0:1)BootConfig: 88:
mcaEnableAllErrorSources = FALSE
0:00:00:00.000 cpu0:1)WARNING: Serial: 646: Invalid serial port config:
mem-mapped to addr 0x0.
0:00:00:00.736 cpu0:1)Starting VMKernel initialization …
0:00:00:00.749 cpu0:1)Initializing memory …
0:00:00:17.577 cpu0:1)WARNING: MemMap: 2803: Node 3 has no free pages
0:00:00:17.598 cpu0:1)IRQ: 540: 0xfe <error> exclusive, flags 0x0
0:00:00:17.601 cpu0:1)WARNING: Vmkperf: 3929: 2 perf counters in use by BIOS
0:00:00:23.663 cpu0:32768)Initializing storage stack …
0:00:00:23.789 cpu0:32768)PCI: 59: Failed to initialize _OSC for 0000:7f
0:00:00:24.267 cpu0:32768)Init: 879: Vmkernel
initialization done.
0:00:00:24.267 cpu0:32768)VMKernel loaded successfully.
2019-10-07T12:30:24.822Z cpu0:32768)Loading module user …
2019-10-07T12:30:24.836Z cpu0:32768)user loaded successfully.
2019-10-07T12:32:15.563Z cpu69:35974)Boot Successful
2019-10-08T02:59:48.064Z cpu36:193558)Migrate: vm
193561: 3286: Setting VMOTION info: Dest ts = 1570503582214591, src ip =
<169.78.11.7> dest ip = <169.78.11.2> Dest wid = 0 using SHARED
swap
2019-10-08T02:59:48.085Z cpu36:193558)Hbr: 3340: Migration start received
(worldID=193561) (migrateType=1) (event=0) (isSource=0) (sharedConfig=1)
Hostd Logs:
- Hostd Logs show some events associated to the FDS Limit under
Memory:
https://ikb.vmware.com/s/article/50111942?lang=en_US
2019-10-07T12:25:10.858Z [3EBC1B70 verbose
‘SoapAdapter’] Responded to service state request
2019-10-07T12:25:31.952Z
[3EDC1B70 verbose ‘SoapAdapter’] Responded to service state request
Section for
VMware ESX, pid=34510, version=5.5.0, build=6480324, option=Release
——
In-memory logs start ——–
mem>
2019-10-07T12:31:48.068Z [FF9589A0 verbose ‘Default’] No update forwarding
configured
mem>
2019-10-07T12:31:48.068Z [FF9589A0 info ‘Default’] Supported VMs 512
mem>
2019-10-07T12:31:48.068Z [FF9589A0 warning ‘Default’] Estimated fds limit 4096
> 4096 max supported by setrlimit. Setting fds limit to 4096
mem>
2019-10-07T12:31:48.068Z [FF9589A0 info ‘Handle checker’] Setting system limit
of 4096
mem>
2019-10-07T12:31:48.068Z [FF9589A0 info ‘Handle checker’] Set system limit to
4096
mem>
2019-10-07T12:31:48.068Z [FF9589A0 info ‘Default’] Setting malloc mmap
threshold to 32 k
mem>
2019-10-07T12:31:48.068Z [FF9589A0 info ‘Default’] getrlimit(RLIMIT_NPROC):
curr=64 max=128, return code = Success
mem>
2019-10-07T12:31:48.068Z [FF9589A0 info ‘Default’] setrlimit(RLIMIT_NPROC):
curr=128 max=128, return code = Success
—— In-memory logs end ——–
Hostname: abcde01188.emea.ad.xyze.com
VMware ESXi 5.5.0 build-6480324 ( ESXi 5.5 Update
3f )
FDM Logs:
2019-10-07T12:26:11.739Z [60CC2B70 verbose
‘Execution’] [ActionScheduler::AddPendingActionInt] Added pending action opId =
host-15886:0-0, cfgPath = /vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx,
type = VmFailover, priority = 32
2019-10-07T12:26:11.739Z [FFC7AB70 verbose ‘Execution’ opID=host-15886:0-0]
[FailoverAction::StartAsync] Failing over vm
/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx
(isRegistered=false)
2019-10-07T12:26:11.739Z [FFC7AB70 verbose ‘Execution’ opID=host-15886:0-0]
[FailoverAction::StartAsync] Registering vm
2019-10-07T12:26:21.750Z [60CC2B70 warning ‘commonvpxLro’] [VpxLRO] No
taskinfo property updates for
haTask-ha-folder-vm-vim.Folder.registerVm-274763869 in 10000 ms
2019-10-07T12:26:21.753Z [60CC2B70 warning ‘commonvpxLro’] [VpxLRO]
Taskinfo polling sees no change
2019-10-07T12:26:28.228Z
[60CC2B70 verbose ‘Execution’ opID=host-15886:0-0]
[FailoverAction::RegisterCompletionCallback] Registering vm done
(vmid=/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx,
hostdVmId=2)
2019-10-07T12:26:28.231Z [60CC2B70 verbose ‘Execution’ opID=host-15886:0-0]
[FailoverAction::InitiateReconfigure] Not reconfiguring vm
2019-10-07T12:26:28.231Z [60CC2B70 verbose ‘Execution’ opID=host-15886:0-0]
[FailoverAction::ReconfigureCompletionCallback] Powering on vm
2019-10-07T12:26:29.094Z [60540B70 verbose ‘Execution’ opID=SWI-2a65dcf0]
[ExecutionManagerImpl::HandleVmQuestion] Question while failing over vm
/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx,
Label = msg.uuid.altered
2019-10-07T12:26:29.760Z [60C40B70 verbose ‘Execution’ opID=host-15886:0-0]
[FailoverAction::PowerOnCompletionCallback] Power on vm
/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx
done
2019-10-07T12:26:48.077Z
[FFBD5B90 verbose ‘Invt’ opID=SWI-42409122]
[VmHeartbeatStateChange::SaveToInventory] vm
/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx
changed toolsStatus=toolsOk
VM Kernel Logs:
- VM Worlds are started on the Esxi Host because of the reboot.
2019-10-07T12:26:29.002Z cpu15:82029353)World:
vm 82029356: 1466: Starting world vmm0:abcd00008134 of type 8
2019-10-07T12:26:29.019Z cpu15:82029353)World: vm 82029358: 1466:
Starting world vmm1:abcd00008134 of type 8
2019-10-07T12:26:29.019Z cpu15:82029353)World: vm 82029359: 1466:
Starting world vmm2:abcd00008134 of type 8
2019-10-07T12:26:29.019Z cpu15:82029353)World: vm 82029360: 1466:
Starting world vmm3:abcd00008134 of type 8
2019-10-07T12:26:29.019Z cpu15:82029353)World: vm 82029361: 1466:
Starting world vmm4:abcd00008134 of type 8
2019-10-07T12:26:29.019Z cpu15:82029353)World: vm 82029362: 1466:
Starting world vmm5:abcd00008134 of type 8
2019-10-07T12:26:29.019Z cpu15:82029353)World: vm 82029363: 1466:
Starting world vmm6:abcd00008134 of type 8
2019-10-07T12:26:29.019Z cpu15:82029353)World: vm 82029364: 1466:
Starting world vmm7:abcd00008134 of type 8
Conclusion:
- From the logs we are not able to conclude the Exact reason for the
Host reboot, the only event we can find is below:
2019-10-07T12:32:03.664Z [FF9589A0 verbose
‘Solo.VmwareCLI’] Command reboot (reboot)
- In order to investigate further we need the Support Bundle from
- Esxi Host : abcde01194.svr.emea.xyze.net as this is the Master
for FDM.
- vCenter: 169.94.153.58
______________________________________________________________________________
Hostname: esx-abcde01194.emea.ad.xyze.com
FDM Logs:
- As per the Article we can see that the Master lost the
communication to the Esxi Host-15862 after which a cluster Failover is
initiated:
2019-10-07T12:25:50.695Z [FFB3DB70 error ‘Cluster’
opID=SWI-56f32f43] [ClusterSlave::LiveCheck] Timeout for slave @ host-15862
2019-10-07T12:26:11.731Z
[21E81B70 verbose ‘Placement’] [PlacementManagerImpl::IssuePlacementStartCompleteEventLocked]
Issue failover start event
2019-10-07T12:26:11.732Z [21A40B70
verbose ‘FDM’ opID=SWI-3ceb7cde] [FdmService] New event:
EventEx=com.vmware.vc.HA.ClusterFailoverActionInitiatedEvent vm= host=
tag=host-15886:-390685961:1
2019-10-07T12:26:11.735Z [21EC2B70 verbose ‘Placement’ opID=SWI-7d923e5b]
[DrmPE::GenerateFailoverRecommendation] Snapshot is valid
2019-10-07T12:26:11.735Z [21EC2B70 verbose ‘Placement’ opID=SWI-7d923e5b]
[DrmPE::GenerateFailoverRecommendation] 1 Vms are to be powered on
2019-10-07T12:26:11.736Z [21EC2B70 verbose ‘Placement’ opID=SWI-7d923e5b]
[DrmPE::GenerateFailoverRecommendation] 1 vms added to domain config
2019-10-07T12:26:11.736Z [21EC2B70 verbose ‘Placement’ opID=SWI-7d923e5b]
[DrmPE::InvokeDrsMultiplePasses] Pass2: respect host preference but not
failover hosts
- Since the Failover task was initiated, the Failed VM Name abcd00008134
was started:
2019-10-07T12:26:28.225Z [22281B70 verbose ‘Invt’
opID=SWI-2ce496be] [VmStateChange::SavePowerChange] Vm
/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx
uncleanly powered off; alreadyCreated=true
2019-10-07T12:26:29.761Z
[21E81B70 verbose ‘Policy’ opID=SWI-5e8a2842]
[VmOperationsManager::RequestCancelPlacement] VM ID
/vmfs/volumes/5aead89d-4a42a019-c968-d06726cf4f12/abcd00008134/abcd00008134.vmx:
Transitioned to ‘Cancel’ state for ‘VmFailover’
2019-10-07T12:26:48.287Z [22240B70
verbose ‘Placement’]
[PlacementManagerImpl::IssuePlacementStartCompleteEventLocked] Issue failover
complete event
2019-10-07T12:26:48.288Z [21EC2B70 verbose ‘FDM’ opID=SWI-19a5a584]
[FdmService] New event:
EventEx=com.vmware.vc.HA.ClusterFailoverActionCompletedEvent vm= host=
tag=host-15886:-390685961:2
2019-10-07T12:27:46.730Z [FFBBFB70 error
‘Election’ opID=SWI-6058ed8] [ClusterElection::SendAll] sendto 10.189.183.152
failed: Host is down
2019-10-07T12:27:47.731Z [FFBBFB70 error ‘Election’ opID=SWI-6058ed8]
[ClusterElection::SendAll] sendto 10.189.183.152 failed: Host is down
2019-10-07T12:27:48.733Z [FFBBFB70 error ‘Election’ opID=SWI-6058ed8]
[ClusterElection::SendAll] sendto 10.189.183.152 failed: Host is down
2019-10-07T12:27:49.734Z [FFBBFB70 error ‘Election’ opID=SWI-6058ed8]
[ClusterElection::SendAll] sendto 10.189.183.152 failed: Host is down
2019-10-07T12:27:50.225Z [21EC2B70 warning ‘Cluster’] [HostPing::Ping]
sendto[ipv4] 10.189.183.152: Host is down
2019-10-07T12:27:50.736Z [FFBBFB70 error ‘Election’ opID=SWI-6058ed8]
[ClusterElection::SendAll] sendto 10.189.183.152 failed: Host is down
2019-10-07T12:32:25.912Z [FFB3DB70 error ‘Cluster’ opID=SWI-56f32f43] Slave
host-15862 is already registered
2019-09-07T14:25:33.155Z [21EC2B70 error ‘commonvpxVmomi’] [VpxVmomi] IO
exception while sending result
vCenter Logs:
- Logs are not available around 07/10/2019.
Hostname: abcde01203.emea.ad.xyze.com
VMK Summary Logs:
- From the VMK summary we can only see the Event for the Host has
booted up:
2019-10-07T12:32:15Z
bootstop: Host has booted
- When a host shuts down unexpectedly, you do not see log
information to indicate the shutdown. The logs indicate a start-up only.
- For More information: https://kb.vmware.com/s/article/2036544
- From the IPMI Logs we can see few Events associated to soft-off
but not sure about the exact problem:
Record
Id: 85 When: 2019-10-07T12:33:40 Message: Assert + Processor IERR
Record Id: 86 When: 2019-10-07T12:34:03 Message: Assert + System ACPI Power
State S0/G0: working
Record Id: 87 When: 2019-10-07T12:34:30 Message: Assert + System ACPI Power
State S4/S5: soft-off
Record Id: 88 When: 2019-10-07T12:34:33 Message: Assert + System ACPI Power
State S0/G0: working
Record Id: 89 When: 2019-10-07T12:34:49 Message: Assert + System ACPI Power
State S0/G0: working
Record Id: 90 When: 2019-10-07T12:30:02 Message: Deassert + Drive Slot Drive
Present
Record Id: 91 When: 2019-10-07T12:30:02 Message: Assert + Drive Slot In Failed
Array
Record Id: 92 When: 2019-10-07T12:30:02 Message: Deassert + Drive Slot Drive
Present
Record Id: 93 When: 2019-10-07T12:30:02 Message: Assert + Drive Slot In Failed
Array
Conclusion:
- As per the Article: https://kb.vmware.com/s/article/2036544
we are only able not able to see any PSOD events or any User reboot Task.
- Please check the Hardware Associated logs for any events and Task.
- Please get an Extensive Hardware Diagnostics done on the Esxi Host
with discussion with the Hardware Vendor.
- Since there we don’t have the logs available in the vCenter bundle
for the Time when the issue happened we are not able to find any vCenter
related task and information.