RCA 18: ESXi Host in Not Responding State

Host abcdvvdivh106.gso.abcd.com in IPC2-VDI is not responding

Host hpnpvvdivh102.gso.abcd.com in IPC2-VDI is not responding

 

Time stamp of the issue:6/12/2019 9:48:05 AM MST

Time Stamp in GMT:: 16:48 12.06.2019 GMT

 

vCenter Logs:

 

  •  From the vCenter logs we can see that around 16:44 we can see that Multiple Host got disconnected from the vCenter Server.

 

2019-06-12T16:44:13.085Z warning vpxd[7F12837EF700] [Originator@6876 sub=vmomi.soapStub[18334] opID=287c5427] Terminating invocation: server=<cs p:00007f124c0ee530, TCP:hpnpvvdivh04.gso.abcd.com:443>, moref=vim.encryption.CryptoManager:ha-crypto-manager, method=IsEnabled
2019-06-12T16:44:49.680Z warning vpxd[7F12A04EF700] [Originator@6876 sub=VpxProfiler opID=HB-host-81554@97888-2850012f-EventManagerProcessJobs-5f991d3] EventManagerProcessJobs [TotalTime] took 304150 ms

9-06-13T02:57:32.902Z info vpxd[7F12A1188700] [Originator@6876 sub=Default opID=540c73c8] [VpxLRO] — ERROR lro-348456864 — guestOperationsAuthManager — vim.vm.guest.AuthManager.acquireCredentials: vim.fault.GuestAuthenticationChallenge:

 

2019-06-12T16:44:49.782Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host abcdvvdivh106.gso.abcd.com
2019-06-12T16:44:49.798Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh03.gso.abcd.com
2019-06-12T16:44:50.964Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh104.gso.abcd.com
2019-06-12T16:44:52.178Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh04.gso.abcd.com
2019-06-12T16:44:53.029Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh102.gso.abcd.com
2019-06-12T16:44:55.755Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh103.gso.abcd.com
2019-06-12T16:44:55.963Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh101.gso.abcd.com
2019-06-12T16:44:57.012Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh01.gso.abcd.com
2019-06-12T16:44:57.130Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh02.gso.abcd.com
2019-06-12T16:44:57.283Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh105.gso.abcd.com
2019-06-12T16:44:57.396Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh05.gso.abcd.com

 

 

 

 

Hostname: abcdvvdivh106.gso.abcd.com

vCenter with IP 11.11.6.42

 

 

VOBD:

Doesn’t have logs at the time of issue

 

VPXA:

Doesn’t have any information when we have seen the disconnection

 

VMKernel Logs:

 

  • From the VMKernel Logs we can see that there is event for VMKAPIMOD after which there is a LAG Create Event.

 

2019-06-12T16:48:09.932Z cpu62:68755 opID=da63ffd6)World: 12235: VC opID HB-SpecSync-host-81563@110-d4b1b29-19-9c56 maps to vmkernel opID da63ffd6
2019-06-12T16:48:09.932Z cpu62:68755 opID=da63ffd6)Config: 706: “VMOverheadGrowthLimit” = 4294967295, Old Value: 0, (Status: 0x0)
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)World: 12235: VC opID HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-9b-9c6a maps to vmkernel opID ea54486e
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)Team.etherswitch: TeamESLACPLAGEventCB:6277: Received a LAG CREATE event version :2, lagId :0, lagLinkStatus :NOT USED,lagName :, uplinkName :, portLinkStatus :NOT USED, portID :0x0
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)netioc: NetIOCSetRespoolVersion:245: Set netioc version for portset: DvsPortset-1 to 3,old threshold: 3
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)netioc: NetIOCSetupUplinkReservationThreshold:135: Set threshold for portset: DvsPortset-1 to 75, old threshold: 75
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)netioc: NetIOCPortsetNetSchedStatusSet:1207: Set sched status for portset: DvsPortset-1 to Inactive, old:Inactive
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VLANMTUCheck: NMVCDeployClear:871: can’t not find psReq for ps DvsPortset-1
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)Team.etherswitch: TeamESLACPLAGEventCB:6277: Received a LAG CREATE event version :2, lagId :0, lagLinkStatus :NOT USED,lagName :, uplinkName :, portLinkStatus :NOT USED, portID :0x0
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)netioc: NetIOCSetRespoolVersion:245: Set netioc version for portset: DvsPortset-0 to 3,old threshold: 3
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)netioc: NetIOCSetupUplinkReservationThreshold:135: Set threshold for portset: DvsPortset-0 to 75, old threshold: 75
2019-06-12T16:48:10.042Z cpu46:67939 opID=ea54486e)netioc: NetIOCPortsetNetSchedStatusSet:1207: Set sched status for portset: DvsPortset-0 to Inactive, old:Inactive
2019-06-12T16:48:10.042Z cpu46:67939 opID=ea54486e)VLANMTUCheck: NMVCDeployClear:871: can’t not find psReq for ps DvsPortset-0
2019-06-12T16:48:10.192Z cpu41:68751 opID=562a74d)World: 12235: VC opID HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-4e-9c86 maps to vmkernel opID 562a74d
2019-06-12T16:48:10.192Z cpu41:68751 opID=562a74d)lacp: LACPDisableDVPort:4700: LACP is not enabled on portset DvsPortset-1
2019-06-12T16:48:10.193Z cpu41:68751 opID=562a74d)lacp: LACPDisableDVPort:4700: LACP is not enabled on portset DvsPortset-1
2019-06-12T16:48:10.200Z cpu47:68752 opID=9c88f0c9)World: 12235: VC opID HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-fe-9c87 maps to vmkernel opID 9c88f0c9
2019-06-12T16:48:10.200Z cpu47:68752 opID=9c88f0c9)lacp: LACPDisableDVPort:4700: LACP is not enabled on portset DvsPortset-0
2019-06-12T16:48:10.202Z cpu47:68752 opID=9c88f0c9)lacp: LACPDisableDVPort:4700: LACP is not enabled on portset DvsPortset-0

 

Hostd:

 

  • Checked the host d but was not able to track the ongoing issue between the Esxi host and the vCenter.

 

2019-06-12T16:48:10.000Z info hostd[A2E9B80] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-9b-9c6a user=vpxuser:vpxuser] Task Created : haTask-ha-host-vim.host.NetworkSystem.invokeHostTransactionCall-168167914
2019-06-12T16:48:10.003Z info hostd[116C6B70] [Originator@6876 sub=VsanSimsStubImpl opID=b25d9c66] Calling vim.host.VsanSystemEx.GetVsanNics
2019-06-12T16:48:10.003Z info hostd[116C6B70] [Originator@6876 sub=VsanSimsStubImpl opID=b25d9c66] Calling vim.host.VsanSystemEx.GetVsanNics
2019-06-12T16:48:10.020Z warning hostd[A2E9B80] [Originator@6876 sub=Default opID=HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-9b-9c6a user=vpxuser:vpxuser] UpdateDvsVmwareSetting: Attempt to clear the span sessions of a switch
2019-06-12T16:48:10.041Z warning hostd[A2E9B80] [Originator@6876 sub=Default opID=HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-9b-9c6a user=vpxuser:vpxuser] UpdateDvsVmwareSetting: Attempt to clear the span sessions of a switch

 

 

Hostname: hpnpvvdivh102.gso.abcd.com 

 

Esxi Version: VMware ESXi 6.5.0 build-7967591

 

VMKernel Logs:

 

  • From the logs we are can see the same trend of events being generated.

 

2019-06-12T16:48:13.212Z cpu34:68644 opID=508454be)Config: 706: “VMOverheadGrowthLimit” = 4294967295, Old Value: 0, (Status: 0x0)
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)World: 12235: VC opID HB-host-81551@97723-6edbaed6-DvsHandleHostReconnect-f300b5f-fd-fc00 maps to vmkernel opID 7796390a
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)Team.etherswitch: TeamESLACPLAGEventCB:6277: Received a LAG CREATE event version :2, lagId :0, lagLinkStatus :NOT USED,lagName :, uplinkName :, portLinkStatus :NOT USED, portID :0x0
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)netioc: NetIOCSetRespoolVersion:245: Set netioc version for portset: DvsPortset-1 to 3,old threshold: 3
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)netioc: NetIOCSetupUplinkReservationThreshold:135: Set threshold for portset: DvsPortset-1 to 75, old threshold: 75
2019-06-12T16:48:13.512Z cpu36:68649 opID=7796390a)netioc: NetIOCPortsetNetSchedStatusSet:1207: Set sched status for portset: DvsPortset-1 to Inactive, old:Inactive
2019-06-12T16:48:13.512Z cpu36:68649 opID=7796390a)VLANMTUCheck: NMVCDeployClear:871: can’t not find psReq for ps DvsPortset-1
2019-06-12T16:48:13.531Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:13.531Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:13.531Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure
2019-06-12T16:48:13.531Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to  check if port is Uplink : Failure

 

 

Conclusion:

 

 

  • Based on the logs we can see no failure or configurational issue from the Dvport as well as the Ports.
  • To understand more about the issue we need to capture the Network packet traces between the Esxi host as well as the Esxi host so that we can confirm that this is a Hardware Issue.
  • Also As per my understanding we have seen this issue on Two Different cluster having different network as well as host configuration. I will recommend you to find an Intermittent device which is common between the above mentioned host and check if there is any configurational issues. This is because we are getting LAG Create Events in the vmkernel Logs and if we force the physical links down, we may encounter the LAG CREATE at a later time frame.

 

 

 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply