Hostname: abcesx103.abc.com
ESXi Version: ESXi 6.0 EP 15
As per the Article: https://kb.vmware.com/s/article/66977
vSphere 6.0 is already out of General Support.
vmnic |
PCI bus address |
link |
speed |
duplex |
MTU |
driver |
driver version |
firmware version |
MAC address |
VID |
DID |
SVID |
SDID |
name |
vmnic0 |
0000:04:00.0 |
Up |
1000 |
Full |
1500 |
nx_nic |
0.25744213 |
0.173564815 |
80:c1:6e:6f:e3:14 |
4040 |
100 |
103c |
705a |
NetXen HP NC375i Integrated Quad Port Multifunction
Gigabit Server Adapter |
vmnic1 |
0000:04:00.1 |
Up |
1000 |
Full |
1500 |
nx_nic |
0.25744213 |
0.173564815 |
80:c1:6e:6f:e3:15 |
4040 |
100 |
103c |
705a |
NetXen HP NC375i Integrated Quad Port Multifunction
Gigabit Server Adapter |
vmnic2 |
0000:04:00.2 |
Up |
1000 |
Full |
1500 |
nx_nic |
0.25744213 |
0.173564815 |
80:c1:6e:6f:e3:16 |
4040 |
100 |
103c |
705a |
NetXen HP NC375i Integrated Quad Port Multifunction
Gigabit Server Adapter |
vmnic3 |
0000:04:00.3 |
Up |
1000 |
Full |
1500 |
nx_nic |
0.25744213 |
0.173564815 |
80:c1:6e:6f:e3:17 |
4040 |
100 |
103c |
705a |
NetXen HP NC375i Integrated Quad Port Multifunction
Gigabit Server Adapter |
vmnic4 |
0000:10:00.0 |
Up |
1000 |
Full |
1500 |
e1000e |
3.2.2.1-NAPI |
5.12-2 |
00:26:55:db:a8:d1 |
8086 |
10bc |
103c |
704b |
Intel Corporation 82571EB Gigabit Ethernet
Controller (Copper) |
vmnic5 |
0000:10:00.1 |
Up |
1000 |
Full |
1500 |
e1000e |
3.2.2.1-NAPI |
5.12-2 |
00:26:55:db:a8:d0 |
8086 |
10bc |
103c |
704b |
Intel Corporation 82571EB Gigabit Ethernet
Controller (Copper) |
- As per the Vmware HCL:
- Driver
Version 6.0.643 is compatible with 4.0.595 however we are
running on firmware version: 4.0.596
VMKsummary:
- Last Boot:
2021-03-01T16:16:22Z
bootstop: Host has booted
VOBD Logs:
- Reviewed the VOBD Logs but was not able to find any issues:
2021-03-01T16:14:38.601Z:
[netCorrelator] 60982188us: [vob.net.vmnic.linkstate.up] vmnic vmnic4 linkstate
up
2021-03-01T16:14:39.002Z:
[netCorrelator] 61383500us: [esx.clear.net.vmnic.linkstate.up] Physical NIC
vmnic4 linkstate is up
2021-03-01T16:14:39.002Z: An event
(esx.clear.net.vmnic.linkstate.up) could not be sent immediately to hostd;
queueing for retry.
2021-03-01T16:14:39.104Z:
[netCorrelator] 61484667us: [vob.net.vmnic.linkstate.up] vmnic vmnic5 linkstate
up
2021-03-01T16:14:40.002Z:
[netCorrelator] 62383635us: [esx.clear.net.vmnic.linkstate.up] Physical NIC
vmnic5 linkstate is up
2021-03-01T16:14:40.002Z: An event
(esx.clear.net.vmnic.linkstate.up) could not be sent immediately to hostd;
queueing for retry.
Hostd Logs:
- Reviewed the Hostd Logs and we can see a Refresh Task running before
the time of issue but with no specific Error which could crash the Host.
2021-03-01T16:15:46.161Z
verbose hostd[FF94DB20] [Originator@6876 sub=Hostsvc.FSVolumeProvider]
RefreshVMFSVolumes: refreshed volume, id 5ef0a76e-519f1a44-aaf1-002655dbbf1d,
name abcvmax-ESXi_R5, version 5
2021-03-01T16:15:47.504Z verbose
hostd[FF94DB20] [Originator@6876 sub=Hostsvc.FSVolumeProvider]
RefreshVMFSVolumes: refreshed volume, id 5f047095-98f8bec1-3532-0017a4774010,
name abcvmax_VM-001, version 5
2021-03-01T16:15:49.232Z verbose
hostd[FF94DB20] [Originator@6876 sub=Hostsvc.FSVolumeProvider]
RefreshVMFSVolumes: refreshed volume, id 5f0470dd-62ad6da6-0758-0017a4774010,
name abcvmax_VM-003, version 5
2021-03-01T16:15:50.938Z verbose
hostd[FF94DB20] [Originator@6876 sub=Hostsvc.FSVolumeProvider]
RefreshVMFSVolumes: refreshed volume, id 5f0470b8-1910c456-b149-0017a4774010,
name abcvmax_VM-002, version 5
2021-03-01T16:15:51.625Z verbose
hostd[FF94DB20] [Originator@6876 sub=Hostsvc.FSVolumeProvider]
RefreshVMFSVolumes: refreshed volume, id 51f6a550-460c0e78-f563-ac162dbd8980,
name abcvmax-netscout001, version 5
2021-03-01T16:15:53.465Z verbose
hostd[FF94DB20] [Originator@6876 sub=Hostsvc.FSVolumeProvider]
RefreshVMFSVolumes: refreshed volume, id 5f047110-e35fb5f2-8741-0017a4774010,
name abcvmax_VM-004, version 5
2021-03-01T16:15:53.755Z
info hostd[FF94DB20] [Originator@6876 sub=Hostsvc.Datastore] Refresh: VMFS
Datastore name was changed from ‘5ef0a76e-519f1a44-aaf1-002655dbbf1d’ to ‘abcvmax-ESXi_R5′.
2021-03-01T16:15:54.473Z
info hostd[FF94DB20] [Originator@6876 sub=Hostsvc.Datastore] Refresh: VMFS
Datastore name was changed from ‘5f047095-98f8bec1-3532-0017a4774010’ to ‘abcvmax_VM-001′.
2021-03-01T16:15:55.264Z
info hostd[FF94DB20] [Originator@6876 sub=Hostsvc.Datastore] Refresh: VMFS
Datastore name was changed from ‘5f0470dd-62ad6da6-0758-0017a4774010’ to ‘abcvmax_VM-003′.
- Here we can see that the Server Restarted:
–> fullName = “VMware ESX
build-9313334”,
–> version = “6.0.0”,
–> build = “9313334”,
2021-03-01T16:16:04.776Z info
hostd[FF94DB20] [Originator@6876 sub=Hostsvc.HaHost] Initialized
StoragerRM/Vmsvc powering-on interceptor
2021-03-01T16:16:15.643Z
info hostd[FF94DB20] [Originator@6876 sub=Default] BEGIN SERVICES
VMKernel Logs:
- In Vmkernel logs we are not able to find much details:
2021-03-01T16:01:21.839Z
cpu2:34405)FSS: 6264: Failed to open file ‘slotsfile’; Requested flags 0x8,
world: 34405 [storageRM], (Existing flags 0x0, world: 0 [unknown]): Busy
2021-03-01T16:02:13.728Z
cpu38:33187)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:1099: Could not
select path for device “naa.60000970000192605719533030363733”.
2021-03-01T16:02:13.740Z cpu48:33186)WARNING:
vmw_psp_rr: psp_rrSelectPathToActivate:1099: Could not select path for device
“naa.60000970000192605719533030363546”.
2021-03-01T16:02:13.745Z
cpu0:33190)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:1099: Could not
select path for device “naa.60000970000192605719533030363233”.
2021-03-01T16:02:13.747Z
cpu33:33189)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:1099: Could not
select path for device “naa.60000970000192605719533030363442”.
2021-03-01T16:02:13.748Z
cpu33:33189)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate:1099: Could not
select path for device “naa.60000970000192605719533030363337”.
2021-03-01T16:06:32.871Z
cpu63:34405)DLX: 3876: vol ‘abcvmax_VM-003′, lock at 115658752: [Req
mode 1] Checking liveness:
VMB: 49: mbMagic:
2badb002, mbInfo 0x1014b8
FDM Logs:
- Here we can see that the Master for the Cluster is : host-633 which could possibly have more details
about this issue:
2021-03-01T16:02:20.702Z
info fdm[34E81B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterManagerImpl::LogState] hostId=host-602 state=Slave master=host-633
isolated=false host-list-version=144 config-version=270
vm-metadata-version=1187 slv-mst-tdiff-sec=0
2021-03-01T16:16:21.027Z verbose
fdm[FFA707C0] [Originator@6876 sub=Default] Dumping early logs:
—— Early init logs start ——–
2021-03-01T16:16:21.026Z info
-[FFA707C0] [Originator@6876 sub=Default] Glibc malloc guards disabled.
2021-03-01T16:16:21.026Z info
-[FFA707C0] [Originator@6876 sub=Default] Initialized SystemFactory
—— Early init logs end ——–
2021-03-01T16:16:21.027Z info
fdm[FFA707C0] [Originator@6876 sub=Default] Logging uses fast path: false
IPMI Logs:
- IPMI Logs seems to be not updated since 2018:
Record:12:
Record Id: 12
When: 2018-01-06T12:28:58
Event Type: 111 (Unknown)
SEL Type: 2 (System Event)
Message: Assert + Power Supply Failure
status
Sensor Number: 6
Raw:
Formatted-Raw: 0c 00 02 0a c1 50 5a 41 00 04
08 06 6f 01 ff ff
FDM Master:
- From the
Logs we can see that the ESXi Host lost the Heartbeat from the Master Node
and was considered as Dead:
2021-03-01T16:07:50.796Z verbose fdm[54550B70]
[Originator@6876 sub=Placement opID=SWI-ff522be] [RR::ResetAwaitingChangeVms] 0
Vms have been updated
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] Slave
host-602 heartbeating again => Dead to Partitioned
2021-03-01T16:07:51.180Z verbose fdm[54181B70]
[Originator@6876 sub=Cluster opID=SWI-3ab50c2a] Stopping datastore heartbeat
checks for slave host-602
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] host-602
/vmfs/volumes/5f0470dd-62ad6da6-0758-0017a4774010
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterManagerImpl::StopHBDatastoreChecking] datastore
/vmfs/volumes/5f0470dd-62ad6da6-0758-0017a4774010 slave host-602
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::StopHBDatastoreChecking] slave host-602 mac
00:26:55:db:a8:d1
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] host-602
/vmfs/volumes/5f047110-e35fb5f2-8741-0017a4774010
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterManagerImpl::StopHBDatastoreChecking] datastore
/vmfs/volumes/5f047110-e35fb5f2-8741-0017a4774010 slave host-602
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::StopHBDatastoreChecking] slave host-602 mac
00:26:55:db:a8:d1
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::StartHBDatastoreChecking] path
/vmfs/volumes/5f047110-e35fb5f2-8741-0017a4774010 slave host-602
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::StartHBDatastoreChecking] slave host-602 uuid.mac
00:26:55:db:a8:d1
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::StartHBDatastoreChecking] Forcing heartbeat check on
datastore /vmfs/volumes/5f047110-e35fb5f2-8741-0017a4774010 for slave host-602
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::StartHBDatastoreChecking] path
/vmfs/volumes/5f0470dd-62ad6da6-0758-0017a4774010 slave host-602
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::StartHBDatastoreChecking] slave host-602 uuid.mac
00:26:55:db:a8:d1
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::StartHBDatastoreChecking] Forcing heartbeat check on
datastore /vmfs/volumes/5f0470dd-62ad6da6-0758-0017a4774010 for slave host-602
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] Starting
datastore heartbeat checking for slave host-602
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::CheckSlaveHeartbeats] Checking heartbeat datastore
/vmfs/volumes/5f0470dd-62ad6da6-0758-0017a4774010 on localhost
2021-03-01T16:07:51.180Z
verbose fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a]
[ClusterDatastore::CheckSlaveHeartbeats] Checking heartbeat datastore
/vmfs/volumes/5f047110-e35fb5f2-8741-0017a4774010 on localhost
2021-03-01T16:07:51.181Z
info fdm[539C2B70] [Originator@6876 sub=Invt opID=SWI-26bbf021]
[HostStateChange::SaveToInventory] host host-602 changed state: Partitioned
2021-03-01T16:07:51.181Z
verbose fdm[539C2B70] [Originator@6876 sub=Vmcp opID=SWI-26bbf021] Canceling VM
reservation for non-live host host-602
2021-03-01T16:07:53.834Z
verbose fdm[543CAB70] [Originator@6876 sub=FDM] [EventManagerImpl] Received
event from host-617
- We can
see that the ESXi Host state changed from Partitioned to Dead:
2021-03-01T16:08:06.198Z verbose
fdm[54181B70] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] Waited 15 seconds
for disk heartbeat for host host-602 – declaring dead
2021-03-01T16:08:06.201Z
info fdm[5440BB70] [Originator@6876 sub=Invt opID=SWI-1dd14802]
[HostStateChange::SaveToInventory] host host-602 changed state: Dead
- Slaves
Status at the time of issue:
–> Slave states (17):
–> host-720: Live
–> host-801: Live
–> host-602: Dead
–> host-736: Live
–> host-777: Live
–> host-750: Live
–> host-704: Live
–> host-693: Live
–> host-648: Live
–> host-593: Live
–> host-640: Live
–> host-679: Live
–> host-764: Live
–> host-617: Live
–> host-791: Live
–> host-579: Live
–> host-660: Live
- Post
reboot the State of the ESXi Host came Live:
2021-03-01T16:09:41.086Z verbose fdm[545D2B70] [Originator@6876 sub=Invt
opID=SWI-13f56d05] [VmHeartbeatStateChange::SaveToInventory] vm
/vmfs/volumes/5f0470dd-62ad6da6-0758-0017a4774010/abcaseemdb001/abcaseemdb001.vmx
changed guestHB=red
2021-03-01T16:09:52.454Z
verbose fdm[531C2B70] [Originator@6876 sub=Invt opID=SWI-5fc992a1]
[VmHeartbeatStateChange::SaveToInventory] vm
/vmfs/volumes/5f047110-e35fb5f2-8741-0017a4774010/abcintlqueuing003/abcintlqueuing003.vmx
changed guestHB=red
2021-03-01T16:13:48.957Z
verbose fdm[543CAB70] [Originator@6876 sub=Invt opID=SWI-71d88f31]
[VmHeartbeatStateChange::SaveToInventory] vm
/vmfs/volumes/5f0470dd-62ad6da6-0758-0017a4774010/abcsysdb001/abcsysdb001.vmx
changed guestHB=red
2021-03-01T16:15:06.243Z
verbose fdm[53940B70] [Originator@6876 sub=Invt opID=SWI-49db1f66]
[VmHeartbeatStateChange::SaveToInventory] vm
/vmfs/volumes/5f047095-98f8bec1-3532-0017a4774010/abcevent109/abcevent109.vmx
changed toolsStatus=toolsNotRunning
2021-03-01T16:15:06.614Z
info fdm[53181B70] [Originator@6876 sub=Invt opID=SWI-5560ab4d]
[HostStateChange::SaveToInventory] host host-602 changed state: FDMUnreachable
2021-03-01T16:16:23.545Z info fdm[54654B70] [Originator@6876 sub=Invt
opID=SWI-6b9c10a4] [HostStateChange::SaveToInventory] host host-602 changed
state: Live
Conclusion:
- Based on
the logs we can see that the ESXi Host lost Communication with the vCenter
Master HA Node and it was marked as Dead. However we are not able to find
anything conclusive for the reason of restart of the ESXi Host.
As per the Article: https://kb.vmware.com/s/article/1019238
- We don’t
see any BlueScreenTimeout
- No Power
button task initiated a boot.
- There are
no haTask-ha-host-vim.HostSystem.reboot task in the Hostd which can be a
sign the the reboot is being triggered by vCenter.
Action Plan:
- I will
recommend you to check with your Hardware vendor to have an Extensive
Hardware Diagnostics done on the ESXi Host.
- Check if
there is any ASR Enabled for the ESXi Host which restart the ESXi Host
without crashing.
- Please
check with your Hardware vendor to confirm if the currently running
firmware version is supported with the Driver as per the Vmware HCL I can
see we are currently running on 4.0.596 and the supported version
is 4.0.595