RCA 40: ESXi Host Disconnect and Reconnect from vCenter

Issue occurred On April 2nd at around 3:30 PM and 8:54 PM IST

 

host-33     

10.70.111.19 

xyzmauto1-01.abc.com

host-185    

10.70.111.28

xyzmwks1-02.abc.com

 

Time of Issue: 2nd April 3:30 PM and 8:54 PM IST

Time in GMT: 2nd April 10:00 AM GMT and 3:24 PM GMT

  

vCenter Name: abcvc1.abc.com

vCenter Version: vCenter Appliance 6.7 Update 3l

 

 

VPXD Logs:

 

  • From the VPXD Logs we can see the below Events being generated for the Missing Heartbeats.

 

2021-04-02T09:43:27.616Z info vpxd[04997] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-27c64a89] Missed 4 heartbeats for host [vim.HostSystem:host-185,xyzmwks1-02.abc.com]

 

2021-04-02T15:24:47.810Z warning vpxd[01144] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-185-4ba9d84] Connection not alive due to missing heartbeats; [vim.HostSystem:host-185,xyzmwks1-02.abc.com], cnx: 528bfe40-4a2a-b7df-1fa9-5d6a59c566c3
2021-04-02T15:24:47.811Z warning vpxd[01150] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-33-5558d508] Connection not alive due to missing heartbeats; [vim.HostSystem:host-33,xyzmauto1-01.abc.com], cnx: 52d62af2-937e-8cbe-b97b-363dea0accd3

2021-04-02T16:08:56.369Z warning vpxd[04970] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-33-77174bb2] Connection not alive due to missing heartbeats; [vim.HostSystem:host-33,xyzmauto1-01.abc.com], cnx:

 

  • From the logs we can see the same events being generated for other ESXi Host as well:

 

2021-04-02T15:24:47.820Z warning vpxd[05210] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-189-14a13fc9] Connection not alive due to missing heartbeats; [vim.HostSystem:host-189,xyzmwks1-03.abc.com], cnx: 526f6ae3-7b8f-219f-2d4c-207bcc7c584e
2021-04-02T15:24:47.820Z warning vpxd[06887] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-112-58fd4548] Connection not alive due to missing heartbeats; [vim.HostSystem:host-112,xyzmsql1-02.abc.com], cnx: 52cd6735-9e12-acee-c58b-2d6ff334eecb
2021-04-02T15:24:47.820Z warning vpxd[04973] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-150687-5bcfa79c] Connection not alive due to missing heartbeats; [vim.HostSystem:host-150687,xyzmwks-02.abc.com], cnx: 52ae4d2f-2517-8925-552a-1517122e6db0
2021-04-02T15:24:47.827Z warning vpxd[04926] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-160-4f54c3e7] Connection not alive due to missing heartbeats; [vim.HostSystem:host-160,xyzmwks1-01.abc.com], cnx: 52fbd9a0-1a72-eeef-590f-a7976413be72
2021-04-02T15:25:26.950Z warning vpxd[01168] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-150693-615a69a] Connection not alive due to missing heartbeats; [vim.HostSystem:host-150693,xyzmwks-03.abc.com], cnx: 52c6f401-ce44-ae56-7084-809cd1f05dd3
2021-04-02T16:31:27.489Z warning vpxd[04934] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-286-31275705] Connection not alive due to missing heartbeats; [vim.HostSystem:host-286,xyzmauto1-07.abc.com], cnx:
2021-04-02T16:41:14.949Z warning vpxd[04912] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-278-681804c3] Connection not alive due to missing heartbeats; [vim.HostSystem:host-278,xyzmauto1-08.abc.com], cnx:
2021-04-03T03:03:57.478Z warning vpxd[06891] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-215-4ce74329] Connection not alive due to missing heartbeats; [vim.HostSystem:host-215,xyzmnveg1-01.abc.com], cnx: 521ad9e0-bbc7-9c93-8cee-4244d1169086
2021-04-03T04:46:04.123Z warning vpxd[05486] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-241-4ddfd5cc] Connection not alive due to missing heartbeats; [vim.HostSystem:host-241,xyzmnveg1-02.abc.com], cnx: 528d262b-0e33-1db8-67c3-2269460c0b23
2021-04-03T07:18:36.829Z warning vpxd[06076] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-219-23d159c6] Connection not alive due to missing heartbeats; [vim.HostSystem:host-219,xyzmnveg1-03.abc.com], cnx: 529e03db-ed34-71d6-171a-17b657cde121

 

  • We can see the below Events for the Missing Heartbeats:

 

2021-04-02T09:41:57.585Z info vpxd[04997] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-27c64a89] Missed 1 heartbeats for host [vim.HostSystem:host-185,xyzmwks1-02.abc.com]
2021-04-02T09:42:09.957Z info vpxd[04997] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-27c64a89] Missed 1 heartbeats for host [vim.HostSystem:host-33,xyzmauto1-01.abc.com]
2021-04-02T09:42:30.007Z info vpxd[04997] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-27c64a89] Missed 4 heartbeats for host [vim.HostSystem:host-281,xyzmauto1-05.abc.com]
2021-04-02T09:42:37.601Z info vpxd[04997] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-27c64a89] Missed 2 heartbeats for host [vim.HostSystem:host-185,xyzmwks1-02.abc.com]
2021-04-02T09:42:47.231Z info vpxd[04997] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-27c64a89] Missed 4 heartbeats for host [vim.HostSystem:host-73273,xyzmwks1-05.abc.com]
2021-04-02T09:43:03.600Z info vpxd[04997] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-27c64a89] Missed 4 heartbeats for host [vim.HostSystem:host-145563,xyzmwks1-08.abc.com]

2021-04-02T09:43:08.567Z info vpxd[04904] [Originator@6876 sub=QuickStats opID=PollQuickStatsLoop-2872edaf] Cancel pending RPC at host [vim.HostSystem:host-185,xyzmwks1-02.abc.com]
2021-04-02T09:43:08.567Z info vpxd[04904] [Originator@6876 sub=QuickStats opID=PollQuickStatsLoop-2872edaf] Cancel pending RPC at host [vim.HostSystem:host-33,xyzmauto1-01.abc.com]

2021-04-02T09:43:27.616Z info vpxd[04997] [Originator@6876 sub=InvtHostCnx opID=HeartbeatStartHandler-27c64a89] Missed 4 heartbeats for host [vim.HostSystem:host-185,xyzmwks1-02.abc.com]

 

  • Connection status has changes:

 

2021-04-02T09:43:55.669Z info vpxd[04936] [Originator@6876 sub=MoHost opID=FdmMonitor-domain-c93-451c0aae] VC state for host host-73273 (initialized -> initialized), FDM state (Live -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2021-04-02T09:43:55.672Z info vpxd[04936] [Originator@6876 sub=MoHost opID=FdmMonitor-domain-c93-451c0aae] VC state for host host-160 (initialized -> initialized), FDM state (Live -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2021-04-02T09:43:55.674Z info vpxd[04936] [Originator@6876 sub=MoHost opID=FdmMonitor-domain-c93-451c0aae] VC state for host host-189 (initialized -> initialized), FDM state (Live -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2021-04-02T09:43:55.675Z info vpxd[04936] [Originator@6876 sub=MoHost opID=FdmMonitor-domain-c93-451c0aae] VC state for host host-185 (initialized -> initialized), FDM state (Live -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2021-04-02T09:43:55.676Z info vpxd[04936] [Originator@6876 sub=MoHost opID=FdmMonitor-domain-c93-451c0aae] VC state for host host-73270 (initialized -> initialized), FDM state (Live -> UNKNOWN_FDM_HSTATE), src of state (null -> null)

2021-04-02T09:43:55.677Z info vpxd[04936] [Originator@6876 sub=MoHost opID=FdmMonitor-domain-c93-451c0aae] VC state for host host-73276 (initialized -> initialized), FDM state (Master -> UNKNOWN_FDM_HSTATE), src of state (null -> null)

 

  • Here we can see the sign of an intermittent Network issue.

 

2021-04-02T09:44:00.513Z error vpxd[05493] [Originator@6876 sub=Vmomi opID=HB-host-33@14677-10ce379] Got vmacore exception: N7Vmacore15SystemExceptionE(Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem,  timeout, or service overload.)
2021-04-02T09:44:00.514Z warning vpxd[05493] [Originator@6876 sub=VpxProfiler opID=HB-host-33@14677-10ce379] DoHostSync:host-33 [GetChangesTime] took 110556 ms
2021-04-02T09:44:00.514Z warning vpxd[05493] [Originator@6876 sub=VpxProfiler opID=HB-host-33@14677-10ce379] DoHostSync:host-33 [DoHostSyncTime] took 110556 ms
2021-04-02T09:44:00.514Z warning vpxd[05493] [Originator@6876 sub=InvtHostCnx opID=HB-host-33@14677-10ce379] Exception occurred during host sync; Host communication failed; [vim.HostSystem:host-33,xyzmauto1-01.abc.com], e: N5Vmomi5Fault17HostCommunication9ExceptionE(Fault cause: vmodl.fault.HostCommunication
2021-04-02T09:44:00.516Z warning vpxd[05493] [Originator@6876 sub=MoHost opID=HB-host-33@14677-10ce379] host [vim.HostSystem:host-33,xyzmauto1-01.abc.com] connection state changed to NO_RESPONSE
2021-04-02T09:44:00.524Z warning vpxd[05493] [Originator@6876 sub=VpxProfiler opID=HB-host-33@14677-10ce379] InvtHostSyncLRO::StartWork [HostSyncTime] took 110566 ms

 

 

2021-04-02T09:44:07.731Z info vpxd[04979] [Originator@6876 sub=MoHost opID=FdmMonitor-domain-c26-28e83140] VC state for host host-33 (initialized -> initialized), FDM state (Live -> UNKNOWN_FDM_HSTATE), src of state (null -> null)

 

 

 

 

Hostname: xyzmauto1-01.abc.com

ESXi Version:  ESXI 6.7 EP18

 

UUID: 192d1376-c4e158aa-eceb-4702467bbc08
Display
Name: Local LSI Disk (naa.618e728372ef3f50248cf91f0b397658)
Vendor: LSI | Model: UCSB-MRAID12G-HE | Is Local: true | Is SSD: false

 

vmnic   PCI bus address  link  speed  duplex  MTU   driver  driver version  firmware version  MAC address        VID   DID   SVID  SDID  name
~—-   —————  —-  —–  ——  —   ——  ————–  —————-  ———–        —   —   —-  —-  ———————————–
vmnic0 
0000:67:00.0     Up    20000  Full    1500  nenic   1.0.33.0        4.4(2e)           00:25:b5:18:00:04  1137  0043  1137  012c  Cisco Systems Inc Cisco VIC Ethernet NIC
vmnic1 
0000:62:00.0     Up    20000  Full    1500  nenic   1.0.33.0        4.4(2e)           00:25:b5:18:00:05  1137  0043  1137  012c  Cisco Systems Inc Cisco VIC Ethernet NIC

 

 

vmnic1

device {

   — General Statistics:

   Rx Packets:966052216

   Tx Packets:1123115435

   Rx Bytes:1157395411361

   Tx Bytes:1519840228126

   Rx Errors:0

   Tx Errors:0

   Rx Dropped:3028

   Tx Dropped:0

   Rx Multicast:35109775

 

vmnic0

device {

   — General Statistics:

   Rx Packets:709719145

   Tx Packets:848594242

   Rx Bytes:823037142048

   Tx Bytes:1108263790091

   Rx Errors:0

   Tx Errors:0

   Rx Dropped:11547

   Tx Dropped:0

   Rx Multicast:35108161

 

 

VOBD Logs:

 

  • No Logging for the time of issue.

 

VMKernel Logs:

 

 

2021-04-02T08:38:37.766Z cpu32:2099948)Config: 703: “SIOControlFlag2” = 0, Old Value: 1, (Status: 0x0)
2021-04-02T08:42:49.948Z cpu34:2098036)ScsiDeviceIO: 3469: Cmd(0x45bb011b6dc0) 0x1a, CmdSN 0x55354 from world 0 to dev “naa.618e728372ef3f50248cf91f0b397658” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2021-04-02T08:43:17.076Z cpu6:2097987)DVFilter: 6053: Checking disconnected filters for timeouts

2021-04-02T09:35:45.019Z cpu8:2098035)ScsiDeviceIO: 3469: Cmd(0x459b01781600) 0x1a, CmdSN 0x55779 from world 0 to dev “naa.618e728372ef3f50248cf91f0b397658” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2021-04-02T09:40:45.066Z cpu25:2098036)ScsiDeviceIO: 3469: Cmd(0x45bb011b9ac0) 0x1a, CmdSN 0x557de from world 0 to dev “naa.618e728372ef3f50248cf91f0b397658” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2021-04-02T09:55:38.501Z cpu0:2098035)ScsiDeviceIO: 3469: Cmd(0x459b01745600) 0x1a, CmdSN 0x558f4 from world 0 to dev “naa.618e728372ef3f50248cf91f0b397658” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2021-04-02T10:00:38.553Z cpu25:2098036)ScsiDeviceIO: 3469: Cmd(0x45bb0d1c64c0) 0x1a, CmdSN 0x55959 from world 0 to dev “naa.618e728372ef3f50248cf91f0b397658” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

 

  • Other than this we are not able to see any issues in the logs.

 

 

Hostd Logs:

 

  • Reviewed the logs but was unable to find any errors for the time of issue:

 

2021-04-02T09:45:56.299Z info hostd[2101879] [Originator@6876 sub=Vimsvc.TaskManager opID=52c7a33e-e0f1 user=vpxuser] Task Completed : haTask-ha-host-vim.host.NetworkSystem.commitTransaction-3557940785 Status success
2021-04-02T09:45:56.499Z info hostd[2101879] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-host-33@14677-21cc38f0-DvsHandleHostReconnect-52af214c-4f-e0fa user=vpxuser] Task Created : haTask–vim.dvs.HostDistributedVirtualSwitchManager.applyDVPort-3557940788
2021-04-02T09:45:56.538Z info hostd[2101879] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=HB-host-33@14677-21cc38f0-DvsHandleHostReconnect-52af214c-4f-e0fa user=vpxuser] Event 1464 : New ports were created in the vSphere Distributed Switch  in ha-datacenter.
2021-04-02T09:45:56.538Z info hostd[2101879] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-host-33@14677-21cc38f0-DvsHandleHostReconnect-52af214c-4f-e0fa user=vpxuser] Task Completed : haTask–vim.dvs.HostDistributedVirtualSwitchManager.applyDVPort-3557940788 Status success
2021-04-02T09:46:00.094Z info hostd[2100402] [Originator@6876 sub=Libs opID=HB-host-33@14685-7d253fe0-3b-e114 user=vpxuser] NetstackInstanceImpl: congestion control algorithm: newreno
2021-04-02T09:46:00.096Z warning hostd[2100402] [Originator@6876 sub=Hostsvc.Tpm20Provider opID=HB-host-33@14685-7d253fe0-3b-e114 user=vpxuser] Unable to retrieve TPM/TXT status. TPM functionality will be unavailable. Failure reason: Unable to get node: Sysinfo error: Not foundSee VMkernel log for details..
2021-04-02T09:46:00.115Z info hostd[2100402] [Originator@6876 sub=Libs opID=HB-host-33@14685-7d253fe0-3b-e114 user=vpxuser] Could not expand environment variable HOME.
2021-04-02T09:46:00.118Z info hostd[2100402] [Originator@6876 sub=Libs opID=HB-host-33@14685-7d253fe0-3b-e114 user=vpxuser] Could not expand environment variable HOME.
2021-04-02T09:46:01.538Z info hostd[2099495] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-SpecSync-host-33@4524-1884cb50-71-e117 user=vpxuser] Task Created : haTask-ha-host-vim.option.OptionManager.updateValues-3557940795
2021-04-02T09:46:01.539Z info hostd[2099495] [Originator@6876 sub=Vimsvc.TaskManager opID=HB-SpecSync-host-33@4524-1884cb50-71-e117 user=vpxuser] Task Completed : haTask-ha-host-vim.option.OptionManager.updateValues-3557940795 Status success
2021-04-02T09:46:36.201Z info hostd[2099494] [Originator@6876 sub=Default opID=72317b9b-38-4f31 user=vpxuser:WINNTDOM\hy_veeamone] IPMI SEL sync took 0 seconds 0 sel records, last 901
2021-04-02T09:46:36.202Z info hostd[2099338] [Originator@6876 sub=Libs opID=5cf5e1af] NetstackInstanceImpl: congestion control algorithm: newreno
2021-04-02T09:47:40.216Z warning hostd[2099948] [Originator@6876 sub=Hostsvc.Tpm20Provider opID=HB-host-33@14689-5c4d09b4-20-e239 user=vpxuser] Unable to retrieve TPM/TXT status. TPM functionality will be unavailable. Failure reason: Unable to get node: Sysinfo error: Not foundSee VMkernel log for details..
2021-04-02T09:47:40.234Z info hostd[2099948] [Originator@6876 sub=Libs opID=HB-host-33@14689-5c4d09b4-20-e239 user=vpxuser] Could not expand environment variable HOME.

 

FDM Logs:

 

 hostId=host-33 state=Slave master=host-281

 

2021-04-02T09:40:34.853Z info fdm[2184558] [Originator@6876 sub=Cluster opID=SWI-3ab50c2a] hostId=host-33 state=Slave master=host-281 isolated=false host-list-version=132 config-version=357 vm-metadata-version=1810 slv-mst-tdiff-sec=0
2021-04-02T09:45:49.525Z warning fdm[2184633] [Originator@6876 sub=Default] [FdmAccessChecker::Validate] No auth data found for privileged operation GetAbout (session=522a6ea7-ceac-703c-89c1-a90ebc312325)
2021-04-02T09:45:49.525Z verbose fdm[2184633] [Originator@6876 sub=Default] CloseSession called for session id=522a6ea7-ceac-703c-89c1-a90ebc312325
2021-04-02T09:45:49.533Z info fdm[2184547] [Originator@6876 sub=vpxLro opID=376b1202] [VpxLRO] — BEGIN lro-7 — fdmService — csi.FdmService.login — 52a0fa73-68a1-b910-bf30-04dc1a908a7e
2021-04-02T09:45:49.542Z verbose fdm[2184547] [Originator@6876 sub=HttpConnectionPool-000004 opID=376b1202] HttpConnectionPoolImpl created. maxPoolConnections = 1; idleTimeout = 900000000; maxOpenConnections = 1; maxConnectionAge = 0
2021-04-02T09:45:49.563Z verbose fdm[2184547] [Originator@6876 sub=vmomi.soapStub[3] opID=376b1202] Resetting stub adapter for server <cs p:000000a26f3ff7b0, TCP:localhost:443> : Closed
2021-04-02T09:45:49.564Z verbose fdm[2184547] [Originator@6876 sub=HalCnx opID=376b1202] Authenticate succeeded: userName=vpxuser
2021-04-02T09:45:49.564Z info fdm[2184547] [Originator@6876 sub=vpxLro opID=376b1202] [VpxLRO] — FINISH lro-7
2021-04-02T09:45:49.805Z error fdm[2184641] [Originator@6876 sub=Vmomi opID=22d68aa9] Caught exception while sending activation result: N5Vmomi5Fault11SystemError9ExceptionE(Fault cause: vmodl.fault.SystemError
–> )
–> [context]zKq7AVECAAAAALd+BQEUZmRtAADsvodmZG0AAMCieQBJcHcANP1wAOgCcQDP65QAluGUAONndQDBnnUAMqR1AHnZcQANAHQAKmF0AFB8dAD6BnUANa58ANOzfABrIIgBO30AbGlicHRocmVhZC5zby4wAAJ9nw5saWJjLnNvLjYA[/context]
2021-04-02T09:47:16.263Z error fdm[2184548] [Originator@6876 sub=Vmomi opID=650e8fe3] Caught exception while sending activation result: N5Vmomi5Fault11SystemError9ExceptionE(Fault cause: vmodl.fault.SystemError
–> )
–> [context]zKq7AVECAAAAALd+BQEUZmRtAADsvodmZG0AAMCieQBJcHcANP1wAOgCcQDP65QAluGUAONndQDBnnUAMqR1AHnZcQANAHQAKmF0AFB8dAD6BnUANa58ANOzfABrIIgBO30AbGlicHRocmVhZC5zby4wAAJ9nw5saWJjLnNvLjYA[/context]
2021-04-02T13:30:42.349Z info fdm[2184630] [Originator@6876 sub=FdmDump] BEGIN DUMP
–> Time=2021-04-02T13:30:42.348579Z
–> OpId=
–> Dump Reason=User
–>
–> MODULE=FdmService
–> Cluster state: Slave (2)
–>
–> Slave states (0):
–>
–> Protected vms (0):
–>
–> Unprotect request vms (0):
–>
–> Locked datastores (0):
–>
–> Events (3):
–>     EventEx=com.vmware.vc.HA.AllIsoAddrsPingable vm= host=host-33 tag=host-33:969119230:2
–>     EventEx=com.vmware.vc.HA.AllHostAddrsPingable vm= host=host-33 tag=host-33:969119230:1
–>     EventEx=com.vmware.vc.HA.ConnectedToMaster vm= host=host-33 tag=host-33:969119230:0
–>
–> MODULE=InventoryManager
–> Cluster state: 2
–> Flatten shares enabled: true
–> Masking cleanShutdown for inaccessible VMs enabled: true
–> Host agent connection state: Connected
–> Pending changes in update queue: 0
–>

 

 

 

Hostname: xyzmwks1-02.

ESXi Version: ESXI 6.7 EP18

 

UUID: 41bb8a7c-d4dfb8b6-8165-a95348589f93
Display
Name: Local LSI Disk (naa.618e728372ef58002490b04c0c3b8b96)

 

vmnic   PCI bus address  link  speed  duplex  MTU   driver  driver version  firmware version  MAC address        VID   DID   SVID  SDID  name
~—-   —————  —-  —–  ——  —   ——  ————–  —————-  ———–        —   —   —-  —-  ———————————–
vmnic0 
0000:67:00.0     Up    20000  Full    1500  nenic   1.0.33.0        4.4(2e)           00:25:b5:18:00:28  1137  0043  1137  012c  Cisco Systems Inc Cisco VIC Ethernet NIC
vmnic1 
0000:62:00.0     Up    20000  Full    1500  nenic   1.0.33.0        4.4(2e)           00:25:b5:18:00:29  1137  0043  1137  012c  Cisco Systems Inc Cisco VIC Ethernet NIC

 

vmnic0

device {

   — General Statistics:

   Rx Packets:425057313

   Tx Packets:236595930

   Rx Bytes:440401291692

   Tx Bytes:117101324416

   Rx Errors:0

   Tx Errors:0

   Rx Dropped:33756

   Tx Dropped:0

 

vmnic1

device {

   — General Statistics:

   Rx Packets:16191512227

   Tx Packets:9354395379

   Rx Bytes:22318573365334

   Tx Bytes:9287922243634

   Rx Errors:0

   Tx Errors:0

   Rx Dropped:217425

   Tx Dropped:0

   Rx Multicast:33324658

 

 

VOBD Logs:

 

  • No Logging for the time of issue.

 

VMKernel Logs:

 

  • Unable to see any errors at the time of issue other than the SCSI Errors which can be ignored for the local Disk.

 

 

Hostd Logs:

 

 

2021-04-02T09:16:04.540Z error hostd[2215224] [Originator@6876 sub=Default] [LikewiseGetDomainJoinInfo:354] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0):
–>
2021-04-02T09:16:20.124Z warning hostd[2099352] [Originator@6876 sub=Statssvc] Error stats for pnic: vmnic1
–> droppedRx :214689
–>

2021-04-02T09:30:20.113Z warning hostd[2099950] [Originator@6876 sub=Statssvc] Error stats for pnic: vmnic1
–> droppedRx :215086
–>

2021-04-02T10:03:00.132Z warning hostd[2099403] [Originator@6876 sub=Statssvc] Error stats for pnic: vmnic1
–> droppedRx :216139
–>

 

 

Conclusion:

 

  • Based on the logs we can see that the issue started due to the vCenter missing the heartbeat from the ESXi Host. Once the Consecutive heartbeats are being missing causes the ESXi Host to get disconnected which then connects back on its own.

 

  • From the ESXi Host we can see below Driver/Firmware Driver running:

 

Driver Version: 1.0.33.0

Firmware Version: 4.4(2e)

 

As per the Vmware HCL: https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=37360&deviceCategory=io&details=1&VID=1137&DID=0043&SVID=1137&SSID=012c&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

 

  • The Driver version 1.0.33.0 is Compatible with 4.2(3). However we are running at Higher version which is not verified. This could cause network related issues with the ESXi Host.

 

Action Plan:

 

  • Please check with you Hardware vendor to confirm if the Current installed Firmware Version is supported, else I will recommend you to switch to 4.2(3).
  • Please check with the Network Team in order to check for any Packet Drop in the Environment which could be the reason for the Host Disconnection.
  • As per the Article: https://kb.vmware.com/s/article/1005757?lang=en_US
    • Increase the timeout limit in vCenter Server by editing or creating the Advanced Setting config.vpxd.heartbeat.notRespondingTimeout.

 

vSphere Web Client (Flash/Flex) & vSphere Client (HTML5):

  1. Open the vSphere Web Client or vSphere Client in a web browser and log in.
  2. Select the vCenter object from the inventory under Hosts and Clusters.
  3. Select the Manage or Configure tab.
  4. Select Settings > Advanced Settings.
  5. Click Edit.
  6. In the Key field, type:

    config.vpxd.heartbeat.notRespondingTimeout
     
  7. In the Value field, type:

    120
     
  8. Click Add.
  9. Click OK.
  10. Restart the vCenter Server service

 

Note: Increasing the timeout is a short-term solution until any network issues can be resolved.

 

 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply