Host abcdvvdivh106.gso.abcd.com in IPC2-VDI is not responding
Host hpnpvvdivh102.gso.abcd.com in IPC2-VDI is not responding
Time stamp of the issue:6/12/2019 9:48:05 AM MST
Time Stamp in GMT:: 16:48 12.06.2019 GMT
vCenter Logs:
- From the vCenter logs we can see that around 16:44 we can see that Multiple Host got disconnected from the vCenter Server.
2019-06-12T16:44:13.085Z warning vpxd[7F12837EF700] [Originator@6876 sub=vmomi.soapStub[18334] opID=287c5427] Terminating invocation: server=<cs p:00007f124c0ee530, TCP:hpnpvvdivh04.gso.abcd.com:443>, moref=vim.encryption.CryptoManager:ha-crypto-manager, method=IsEnabled
2019-06-12T16:44:49.680Z warning vpxd[7F12A04EF700] [Originator@6876 sub=VpxProfiler opID=HB-host-81554@97888-2850012f-EventManagerProcessJobs-5f991d3] EventManagerProcessJobs [TotalTime] took 304150 ms
9-06-13T02:57:32.902Z info vpxd[7F12A1188700] [Originator@6876 sub=Default opID=540c73c8] [VpxLRO] — ERROR lro-348456864 — guestOperationsAuthManager — vim.vm.guest.AuthManager.acquireCredentials: vim.fault.GuestAuthenticationChallenge:
2019-06-12T16:44:49.782Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host abcdvvdivh106.gso.abcd.com
2019-06-12T16:44:49.798Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh03.gso.abcd.com
2019-06-12T16:44:50.964Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh104.gso.abcd.com
2019-06-12T16:44:52.178Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh04.gso.abcd.com
2019-06-12T16:44:53.029Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh102.gso.abcd.com
2019-06-12T16:44:55.755Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh103.gso.abcd.com
2019-06-12T16:44:55.963Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh101.gso.abcd.com
2019-06-12T16:44:57.012Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh01.gso.abcd.com
2019-06-12T16:44:57.130Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh02.gso.abcd.com
2019-06-12T16:44:57.283Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh105.gso.abcd.com
2019-06-12T16:44:57.396Z info vpxd[7F12A1815700] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-23776a9b] [VpxdIntHost] Missed 4 heartbeats for host hpnpvvdivh05.gso.abcd.com
Hostname: abcdvvdivh106.gso.abcd.com
vCenter with IP 11.11.6.42
VOBD:
Doesn’t have logs at the time of issue
VPXA:
Doesn’t have any information when we have seen the disconnection
VMKernel Logs:
- From the VMKernel Logs we can see that there is event for VMKAPIMOD after which there is a LAG Create Event.
2019-06-12T16:48:09.932Z cpu62:68755 opID=da63ffd6)World: 12235: VC opID HB-SpecSync-host-81563@110-d4b1b29-19-9c56 maps to vmkernel opID da63ffd6
2019-06-12T16:48:09.932Z cpu62:68755 opID=da63ffd6)Config: 706: “VMOverheadGrowthLimit” = 4294967295, Old Value: 0, (Status: 0x0)
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)World: 12235: VC opID HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-9b-9c6a maps to vmkernel opID ea54486e
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)Team.etherswitch: TeamESLACPLAGEventCB:6277: Received a LAG CREATE event version :2, lagId :0, lagLinkStatus :NOT USED,lagName :, uplinkName :, portLinkStatus :NOT USED, portID :0x0
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)netioc: NetIOCSetRespoolVersion:245: Set netioc version for portset: DvsPortset-1 to 3,old threshold: 3
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)netioc: NetIOCSetupUplinkReservationThreshold:135: Set threshold for portset: DvsPortset-1 to 75, old threshold: 75
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)netioc: NetIOCPortsetNetSchedStatusSet:1207: Set sched status for portset: DvsPortset-1 to Inactive, old:Inactive
2019-06-12T16:48:10.021Z cpu46:67939 opID=ea54486e)VLANMTUCheck: NMVCDeployClear:871: can’t not find psReq for ps DvsPortset-1
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)Team.etherswitch: TeamESLACPLAGEventCB:6277: Received a LAG CREATE event version :2, lagId :0, lagLinkStatus :NOT USED,lagName :, uplinkName :, portLinkStatus :NOT USED, portID :0x0
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)netioc: NetIOCSetRespoolVersion:245: Set netioc version for portset: DvsPortset-0 to 3,old threshold: 3
2019-06-12T16:48:10.041Z cpu46:67939 opID=ea54486e)netioc: NetIOCSetupUplinkReservationThreshold:135: Set threshold for portset: DvsPortset-0 to 75, old threshold: 75
2019-06-12T16:48:10.042Z cpu46:67939 opID=ea54486e)netioc: NetIOCPortsetNetSchedStatusSet:1207: Set sched status for portset: DvsPortset-0 to Inactive, old:Inactive
2019-06-12T16:48:10.042Z cpu46:67939 opID=ea54486e)VLANMTUCheck: NMVCDeployClear:871: can’t not find psReq for ps DvsPortset-0
2019-06-12T16:48:10.192Z cpu41:68751 opID=562a74d)World: 12235: VC opID HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-4e-9c86 maps to vmkernel opID 562a74d
2019-06-12T16:48:10.192Z cpu41:68751 opID=562a74d)lacp: LACPDisableDVPort:4700: LACP is not enabled on portset DvsPortset-1
2019-06-12T16:48:10.193Z cpu41:68751 opID=562a74d)lacp: LACPDisableDVPort:4700: LACP is not enabled on portset DvsPortset-1
2019-06-12T16:48:10.200Z cpu47:68752 opID=9c88f0c9)World: 12235: VC opID HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-fe-9c87 maps to vmkernel opID 9c88f0c9
2019-06-12T16:48:10.200Z cpu47:68752 opID=9c88f0c9)lacp: LACPDisableDVPort:4700: LACP is not enabled on portset DvsPortset-0
2019-06-12T16:48:10.202Z cpu47:68752 opID=9c88f0c9)lacp: LACPDisableDVPort:4700: LACP is not enabled on portset DvsPortset-0
Hostd:
- Checked the host d but was not able to track the ongoing issue between the Esxi host and the vCenter.
2019-06-12T16:48:10.000Z info hostd[A2E9B80] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-9b-9c6a user=vpxuser:vpxuser] Task Created : haTask-ha-host-vim.host.NetworkSystem.invokeHostTransactionCall-168167914
2019-06-12T16:48:10.003Z info hostd[116C6B70] [Originator@6876 sub=VsanSimsStubImpl opID=b25d9c66] Calling vim.host.VsanSystemEx.GetVsanNics
2019-06-12T16:48:10.003Z info hostd[116C6B70] [Originator@6876 sub=VsanSimsStubImpl opID=b25d9c66] Calling vim.host.VsanSystemEx.GetVsanNics
2019-06-12T16:48:10.020Z warning hostd[A2E9B80] [Originator@6876 sub=Default opID=HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-9b-9c6a user=vpxuser:vpxuser] UpdateDvsVmwareSetting: Attempt to clear the span sessions of a switch
2019-06-12T16:48:10.041Z warning hostd[A2E9B80] [Originator@6876 sub=Default opID=HB-host-81563@21969-4eda28e-DvsHandleHostReconnect-451141a4-9b-9c6a user=vpxuser:vpxuser] UpdateDvsVmwareSetting: Attempt to clear the span sessions of a switch
Hostname: hpnpvvdivh102.gso.abcd.com
Esxi Version: VMware ESXi 6.5.0 build-7967591
VMKernel Logs:
- From the logs we are can see the same trend of events being generated.
2019-06-12T16:48:13.212Z cpu34:68644 opID=508454be)Config: 706: “VMOverheadGrowthLimit” = 4294967295, Old Value: 0, (Status: 0x0)
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)World: 12235: VC opID HB-host-81551@97723-6edbaed6-DvsHandleHostReconnect-f300b5f-fd-fc00 maps to vmkernel opID 7796390a
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)Team.etherswitch: TeamESLACPLAGEventCB:6277: Received a LAG CREATE event version :2, lagId :0, lagLinkStatus :NOT USED,lagName :, uplinkName :, portLinkStatus :NOT USED, portID :0x0
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)netioc: NetIOCSetRespoolVersion:245: Set netioc version for portset: DvsPortset-1 to 3,old threshold: 3
2019-06-12T16:48:13.511Z cpu36:68649 opID=7796390a)netioc: NetIOCSetupUplinkReservationThreshold:135: Set threshold for portset: DvsPortset-1 to 75, old threshold: 75
2019-06-12T16:48:13.512Z cpu36:68649 opID=7796390a)netioc: NetIOCPortsetNetSchedStatusSet:1207: Set sched status for portset: DvsPortset-1 to Inactive, old:Inactive
2019-06-12T16:48:13.512Z cpu36:68649 opID=7796390a)VLANMTUCheck: NMVCDeployClear:871: can’t not find psReq for ps DvsPortset-1
2019-06-12T16:48:13.531Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:13.531Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:13.531Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
2019-06-12T16:48:13.531Z cpu36:68649 opID=7796390a)VMKAPIMOD: 86: Failed to check if port is Uplink : Failure
Conclusion:
- Based on the logs we can see no failure or configurational issue from the Dvport as well as the Ports.
- To understand more about the issue we need to capture the Network packet traces between the Esxi host as well as the Esxi host so that we can confirm that this is a Hardware Issue.
- From Esxi Host please run the command: tcpdump-uw -i vmk0 ( For more information please refer: https://kb.vmware.com/s/article/1031186)
- From vCenter Server please run the command: Tcpdump -I eth0 host
- For more information on how to run the command please refer: https://kb.vmware.com/s/article/2084896
- Also As per my understanding we have seen this issue on Two Different cluster having different network as well as host configuration. I will recommend you to find an Intermittent device which is common between the above mentioned host and check if there is any configurational issues. This is because we are getting LAG Create Events in the vmkernel Logs and if we force the physical links down, we may encounter the LAG CREATE at a later time frame.