Hostname: c977abc.int.abcde.com | uptime: 0.0 days | uptime: 54
minutes
Build Version: VMware ESXi 6.5.0 build-10719125
Time of Issue:
2019-06-09T22:43:19.236Z: [GenericCorrelator]
82943240us: [vob.user.host.boot] Host has booted.
2019-06-09T22:43:13.631Z info hostd[B31CB80]
[Originator@6876 sub=Default] BEGIN SERVICES
IPMI log
From the IPMI Logs I can see that there is a
warning regarding the Voltage State asserted.
Record Id: 20 When: 2019-06-09T17:37:29 Message:
Assert + Voltage State Asserted
Record:20:
Record
Id: 20
When:
2019-06-09T17:37:29
Event
Type: 3 (Degraded/Warning)
SEL
Type: 2 (System Event)
Message:
Assert + Voltage State Asserted
Sensor
Number: 37
Raw:
Formatted-Raw: 14 00 02 d9 43 fd 5c 20 00 04 02 25 03 01 ff ff
VMKernel:
- From the
VMKernel Logs we can see that there are some SCSI Errors but its not
associated with our issue and its coming multiple times in a day:
2019-06-09T13:34:07.351Z
cpu35:66359)ScsiDeviceIO: 2954: Cmd(0x439542663980) 0x1a, CmdSN 0x2f3d4b from
world 0 to dev “naa.644a842030e8b5002244da3817e6abcd” failed
H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2019-06-09T22:34:34.662Z
cpu1:66358)ScsiDeviceIO: 2954: Cmd(0x439562854c00) 0x1a, CmdSN 0x2f5f9a from
world 0 to dev “naa.644a842030e8b5002244da3817e6abcd” failed
H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
- From here we can see that the Host was
rebooted around: 2019-06-09T22:34:34.662Z
VMB: 112:
mbMagic: 2badb002, mbInfo 0x1016a8
VMB: 56: flags a6d
0:00:00:00.000
cpu0:1)Init: 252: Time spent in vmkBoot: 906558 us
0:00:00:00.042 cpu0:1)Starting
VMKernel initialization …
0:00:00:00.050 cpu0:1)Initializing
memory …
0:00:00:10.315
cpu0:65536)Initializing storage stack …
0:00:00:10.490
cpu0:65536)VMKernel loaded successfully.
2019-06-09T22:42:06.414Z
cpu0:65536)Loading module user …
2019-06-09T22:42:06.430Z
cpu0:65536)user loaded successfully.
VOBD:
- From the VOBD logs we are not able to find much information
regarding reason for crash:
2019-06-09T22:42:33.486Z:
[netCorrelator] 37801385us: [vob.net.vmnic.linkstate.down] vmnic vmnic1
linkstate down
2019-06-09T22:42:33.492Z: [netCorrelator] 37807621us:
[vob.net.firewall.port.hooked] Port vmk0 is now protected by Firewall.
2019-06-09T22:42:33.492Z: [netCorrelator] 37807734us:
[esx.audit.net.firewall.port.hooked] Port vmk0 is now protected by Firewall.
2019-06-09T22:42:33.492Z: An event (esx.audit.net.firewall.port.hooked) could
not be sent immediately to hostd; queueing for retry.
2019-06-09T22:42:33.498Z: [netCorrelator] 37813506us:
[vob.net.firewall.port.hooked] Port vmk1 is now protected by Firewall.
2019-06-09T22:42:33.498Z: [netCorrelator] 37813594us:
[esx.audit.net.firewall.port.hooked] Port vmk1 is now protected by Firewall.
2019-06-09T22:42:33.498Z: An event (esx.audit.net.firewall.port.hooked) could
not be sent immediately to hostd; queueing for retry.
2019-06-09T22:42:33.988Z: [netCorrelator] 38303461us:
[vob.net.vmnic.linkstate.up] vmnic vmnic1 linkstate up
2019-06-09T22:42:34.002Z: [netCorrelator]
38317296us: [esx.problem.net.vmnic.linkstate.down] Physical NIC vmnic1
linkstate is down
2019-06-09T22:42:34.002Z: An event
(esx.problem.net.vmnic.linkstate.down) could not be sent immediately to hostd;
queueing for retry.
2019-06-09T22:42:34.002Z: [netCorrelator] 38317340us:
[esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic1 linkstate is up
2019-06-09T22:42:34.002Z: An event (esx.clear.net.vmnic.linkstate.up) could not
be sent immediately to hostd; queueing for retry.
2019-06-09T22:42:34.685Z: [netCorrelator]
39000197us: [vob.net.lacp.lag.transition.down] LACP warning: Link aggregation
group lag1 on VDS DvsPortset-0 is down.
2019-06-09T22:42:34.685Z: [netCorrelator]
39000591us: [esx.problem.net.lacp.lag.transition.down] LAG lag1 on VDS
DvsPortset-0 is down.
2019-06-09T22:42:34.685Z: An event
(esx.problem.net.lacp.lag.transition.down) could not be sent immediately to
hostd; queueing for retry.
2019-06-09T22:42:41.907Z:
[GenericCorrelator] 46222209us: [vob.user.coredump.configured2] At least one
coredump target is enabled.
2019-06-09T22:42:41.907Z: [UserLevelCorrelator] 46222209us:
[vob.user.coredump.configured2] At least one coredump target is enabled.
2019-06-09T22:42:42.445Z: An event (esx.audit.vmfs.volume.mounted) could not be
sent immediately to hostd; queueing for retry.
2019-06-09T22:42:42.490Z: [vmfsCorrelator] 46805453us:
[vob.vmfs.nfs.server.restored] Restored connection to the server abcdlnt-h0141.int.abcde.com
mount point /cb0780_infra_virtual_ich0141_vol01_snap07, mounted as
9d54a394-b36f2c04-0000-000000000000
(“cb0780_infra_virtual_ich0141_vol01_snap07”)
2019-06-09T22:42:42.490Z:
[vmfsCorrelator] 46805555us: [esx.problem.vmfs.nfs.server.restored] abcdlnt-h0141.int.abcde.com
/cb0780_infra_virtual_ich0141_vol01_snap07 9d54a394-b36f2c04-0000-000000000000
cb0780_infra_virtual_ich0141_vol01_snap07
2019-06-09T22:42:42.490Z: An event (esx.problem.vmfs.nfs.server.restored) could
not be sent immediately to hostd; queueing for retry.
2019-06-09T22:42:42.542Z: [vmfsCorrelator] 46858041us:
[vob.vmfs.nfs.server.restored] Restored connection to the server abcdlnt-h0027.int.abcde.com
mount point /cb0780_infra_virtual_abc0027_vol04_snap07, mounted as
fc1d0c2f-fbfeb095-0000-000000000000 (“cb0780_infra_virtual_abc0027_vol04_snap07″)
2019-06-09T22:42:42.542Z:
[vmfsCorrelator] 46858140us: [esx.problem.vmfs.nfs.server.restored] abcdlnt-h0027.int.abcde.com
/cb0780_infra_virtual_abc0027_vol04_snap07
fc1d0c2f-fbfeb095-0000-000000000000 cb0780_infra_virtual_abc0027_vol04_snap07
Hostd Logs:
—— Early
init logs start ——–
2019-06-09T22:43:03.273Z verbose -[B31CB80] [Originator@6876 sub=Default] No
update forwarding configured
2019-06-09T22:43:03.273Z info -[B31CB80] [Originator@6876 sub=Default]
Supported VMs 768
2019-06-09T22:43:03.273Z
warning -[B31CB80] [Originator@6876 sub=Default] Estimated fds limit 4608 >
4096 max supported by setrlimit. Setting fds limit to 4096
2019-06-09T22:43:03.273Z info -[B31CB80] [Originator@6876 sub=Handle checker]
Setting system limit of 4096
2019-06-09T22:43:03.505Z
error hostd[B31CB80] [Originator@6876 sub=Hostsvc] Error reading boot device
type: Hardware Query Error: Unable to get
boot order from BIOS or operation not
supported on this hardwareNot supported
2019-06-09T22:43:03.960Z
warning hostd[B31CB80] [Originator@6876 sub=Solo.EnvironmentManager] Failed to
enumerate parallel devices
2019-06-09T22:43:05.566Z info hostd[B31CB80]
[Originator@6876 sub=Solo.VmwareCLI] (vim.EsxCLI.nvme.device.log.error)
ha-cli-handler-nvme-device-log-error created
Conclusion:
- Based on the logs we were not able to find any reason for the host to fail. Now this could be due to two reasons
- ASR: Automated System Recovery: Please check and verify if the Automatic System Recovery is enabled, This will prevent the Host from creating a dump file.
- For more information please refer: https://www.dell.com/support/article/us/en/04/sln285610/how-to-modify-automated-system-recovery-asr-settings-in-openmanage-system-administrator-omsa?lang=en
- From the IPMI Logs I can see that there is a warning regarding the Voltage State asserted. However this is coming few hours earlier to our issue but it’s worth Investigating. Check on your Hardware events if we are able to see anything. Also engage the Hardware vendor, Incase if you find anything.