RCA 29: VM Becomes inaccessible while updating BMC firmware

Kindly find the time stamp : 19th July, 2020  from 02:30PM SGT to 3:30 PM SGT

 

Note: ESX time is set on UTC

 

Time of Issue: 19th July, 2020  from 02:30PM SGT to 3:30 PM SGT

Time in GMT: 6:30 AM GMT – 7:30 AM GMT 19th July

 

vmnic

PCI bus address

link

speed

duplex

MTU

driver

driver version

firmware version

MAC address

VID

DID

SVID

SDID

name

vmnic0

0000:03:00.0

Down

0

Half

1500

igb

5.0.5.1

1.63, 0x80000a05

0c:c4:7a:90:6e:96

8086

1521

15d9

1521

Intel Corporation I350 Gigabit Network Connection

vmnic1

0000:03:00.1

Down

0

Half

1500

igb

5.0.5.1

1.63, 0x80000a05

0c:c4:7a:90:6e:97

8086

1521

15d9

1521

Intel Corporation I350 Gigabit Network Connection

vmnic2

0000:83:00.0

Up

10000

Full

1500

ixgbe

4.5.3-iov

0x80000528

00:e0:ed:40:94:08

8086

1528

8086

1528

Intel(R) Ethernet Controller X540-AT2

vmnic3

0000:83:00.1

Up

10000

Full

1500

ixgbe

4.5.3-iov

0x80000528

00:e0:ed:40:94:09

8086

1528

8086

1528

Intel(R) Ethernet Controller X540-AT2

 

 

Hostname: esx-ABC2E2VDI0001XP.abc.com.sg

 

 

VOBD Logs:

 

  • Checked the VOBD logs and can see that the NFS Disconnected Event without any other errors.

 

2020-07-19T06:38:13.528Z: [vmfsCorrelator] 3646187759603us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.5.2 /VDI-Desktops5 c599f9b9-80e71816-0000-000000000000 VDI-Desktops5

 

  • After sometime it came online:

 

2020-07-19T07:10:17.786Z: [vmfsCorrelator] 3648112017539us: [esx.problem.vmfs.nfs.server.restored] 192.168.5.2 /VDI-Desktops5 c599f9b9-80e71816-0000-000000000000 VDI-Desktops5

 

  • Later the Host went into Maintenance mode:

 

2020-07-19T11:58:46.682Z: [GenericCorrelator] 3665420913286us: [vob.user.maintenancemode.entering] The host has begun entering maintenance mode
2020-07-19T11:58:46.682Z: [UserLevelCorrelator] 3665420913286us: [vob.user.maintenancemode.entering] The host has begun entering maintenance mode
2020-07-19T11:58:46.682Z: [UserLevelCorrelator] 3665420913508us: [esx.audit.maintenancemode.entering] The host has begun entering maintenance mode.
2020-07-19T11:58:46.789Z: [GenericCorrelator] 3665421020588us: [vob.user.maintenancemode.entered] The host has entered maintenance mode
2020-07-19T11:58:46.789Z: [UserLevelCorrelator] 3665421020588us: [vob.user.maintenancemode.entered] The host has entered maintenance mode
2020-07-19T11:58:46.789Z: [UserLevelCorrelator] 3665421020736us: [esx.audit.maintenancemode.entered] The host has entered maintenance mode.

 

VMKernel Logs:

 

  • From the Vmkernel Logs we can see that the connection to the NFS Share : 192.168.5.2 mount point is lost

 

2020-07-19T06:36:13.530Z cpu32:1289069)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :I/O error
2020-07-19T06:36:44.529Z cpu35:35866)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :I/O error
2020-07-19T06:37:15.530Z cpu29:1289069)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :I/O error
2020-07-19T06:37:45.529Z cpu2:34299)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :I/O error
2020-07-19T06:38:13.528Z cpu11:10120351)WARNING: NFS: 338: Lost connection to the server 192.168.5.2 mount point /VDI-Desktops5, mounted as c599f9b9-80e71816-0000-000000000000 (“VDI-Desktops5”)
2020-07-19T06:38:15.528Z cpu29:1289069)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :I/O error

 

  • Post which it stayed with the status no Connection:

 

2020-07-19T06:38:15.529Z cpu29:1289069)WARNING: NFS: 2208: Failed to get attributes (No connection)
2020-07-19T06:38:15.529Z cpu29:1289069)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :No connection
2020-07-19T06:38:15.529Z cpu29:1289069)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :No connection

2020-07-19T07:04:11.005Z cpu7:34992 opID=221e3886)WARNING: cbrc_filter: CBRCDigestRecomputeInt:3136: Digest 0x430c6a8e7340 – Unable to read data from VMDK file – error Busy bytesTransferred 0

2020-07-19T07:08:15.902Z cpu27:34568)WARNING: NFS: 2208: Failed to get attributes (No connection)
2020-07-19T07:08:21.315Z cpu4:34992 opID=221e3886)WARNING: cbrc_filter: CBRCDigestReadWriteHashKeyChunkInt:576: Digest 0x430c6a8e7340 – Failed to write digest block keychunk. Error No connection
2020-07-19T07:08:21.315Z cpu4:34992 opID=221e3886)WARNING: cbrc_filter: CBRCDigestFlush:743: Digest 0x430c6a8e7340 – Error writing hash keys – all hash keys in digest will be invalidated
2020-07-19T07:08:36.591Z cpu6:1393755 opID=15bd04b0)WARNING: cbrc_filter: CBRCDigestRecomputeInt:3136: Digest 0x430c6a8e7340 – Unable to read data from VMDK file – error Busy bytesTransferred 0

  • After sometime the Connection is established again.

 

2020-07-19T07:10:17.786Z cpu3:10067066)NFS: 347: Restored connection to the server 192.168.5.2 mount point /VDI-Desktops5, mounted as c599f9b9-80e71816-0000-000000000000 (“VDI-Desktops5”)

2020-07-19T07:10:19.524Z cpu51:33360)NFSLock: 579: Start accessing fd 0x4304f6e41670(KDC-W10-S0223-b7a7102b.vswp) again
2020-07-19T07:10:19.526Z cpu51:33360)NFSLock: 579: Start accessing fd 0x4304f6e4ad50(KDC-W10-S0470-b7a7194f.vswp) again
2020-07-19T07:10:19.526Z cpu51:33360)NFSLock: 579: Start accessing fd 0x4304f6e55970() again
2020-07-19T07:10:19.526Z cpu51:33360)NFSLock: 579: Start accessing fd 0x4304f6d7e4a0() again
2020-07-19T07:10:19.526Z cpu51:33360)NFSLock: 579: Start accessing fd 0x4304f6e372d0(KDC-W10-S0318-digest-flat.vmdk) again
2020-07-19T07:10:19.526Z cpu51:33360)NFSLock: 579: Start accessing fd 0x4304f6e1a390(vmx-KDC-W10-S0318-3081180240-1.vswp) again
2020-07-19T07:10:19.526Z cpu51:33360)NFSLock: 579: Start accessing fd 0x4304f6e51570(vmx-KDC-W10-S0223-3081179179-1.vswp) again
2020-07-19T07:10:19.526Z cpu51:10249443)NFSLock: 579: Start accessing fd 0x4304f6e1d030(vmx-KDC-W10-S0511-3081182411-1.vswp) again

 

 

Hostname: esx-ABC2E2VDI0005XP.abc.com.sg

VOBD Logs:

 

2020-07-19T06:38:41.753Z: [vmfsCorrelator] 29649042881930us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.5.2 /VDI-Desktops5 c599f9b9-80e71816-0000-000000000000 VDI-Desktops5

 

 

2020-07-19T07:10:17.716Z: [vmfsCorrelator] 29650938845654us: [esx.problem.vmfs.nfs.server.restored] 192.168.5.2 /VDI-Desktops5 c599f9b9-80e71816-0000-000000000000 VDI-Desktops5

 

 

2020-07-19T15:26:38.963Z: [GenericCorrelator] 29680720091802us: [vob.user.maintenancemode.entering] The host has begun entering maintenance mode
2020-07-19T15:26:38.963Z: [UserLevelCorrelator] 29680720091802us: [vob.user.maintenancemode.entering] The host has begun entering maintenance mode
2020-07-19T15:26:38.963Z: [UserLevelCorrelator] 29680720091996us: [esx.audit.maintenancemode.entering] The host has begun entering maintenance mode.
2020-07-19T15:26:39.070Z: [GenericCorrelator] 29680720198806us: [vob.user.maintenancemode.entered] The host has entered maintenance mode
2020-07-19T15:26:39.070Z: [UserLevelCorrelator] 29680720198806us: [vob.user.maintenancemode.entered] The host has entered maintenance mode
2020-07-19T15:26:39.070Z: [UserLevelCorrelator] 29680720198907us: [esx.audit.maintenancemode.entered] The host has entered maintenance mode.

 

VMKernel Logs:

 

  • Same set of trend case be seen here, issue started with the Mount Point loosing connection.

 

2020-07-19T06:32:42.756Z cpu9:82523569)NFSLock: 620: Stop accessing fd 0x43052a57da00(KDC-W10-S0963-b7a72e76.vswp)  3
2020-07-19T06:32:42.756Z cpu9:82523569)NFSLock: 620: Stop accessing fd 0x43052a665430(KDC-W10-S0975-b7a72e99.vswp)  3
2020-07-19T06:32:42.756Z cpu9:82523569)NFSLock: 620: Stop accessing fd 0x43052a60e010(vmx-KDC-W10-S0269-3081179317-1.vswp)  3
2020-07-19T06:32:42.756Z cpu9:82523569)NFSLock: 620: Stop accessing fd 0x43052a570140(KDC-W10-S0269-digest-flat.vmdk)  3
2020-07-19T06:32:42.756Z cpu9:82523569)NFSLock: 620: Stop accessing fd 0x43052a656410(vmx-KDC-W10-S0699-3081183772-1.vswp)  3
2020-07-19T06:33:47.965Z cpu36:83339685)WARNING: VSCSI: 3711: handle 15668(vscsi0:0):WaitForCIF: Issuing reset;  number of CIF:2
2020-07-19T06:33:47.965Z cpu36:83339685)WARNING: VSCSI: 2632: handle 15668(vscsi0:0):Ignoring double reset
2020-07-19T06:33:48.209Z cpu42:81932581)WARNING: VSCSI: 3711: handle 15679(vscsi0:0):WaitForCIF: Issuing reset;  number of CIF:2
2020-07-19T06:33:48.209Z cpu42:81932581)WARNING: VSCSI: 2632: handle 15679(vscsi0:0):Ignoring double reset
2020-07-19T06:33:48.237Z cpu30:83308835)WARNING: VSCSI: 3711: handle 15678(vscsi0:0):WaitForCIF: Issuing reset;  number of CIF:2
2020-07-19T06:36:18.781Z cpu36:34295)BC: 2440: Failed to write (uncached) object ‘.iormstats.sf’: No connection
2020-07-19T06:36:18.786Z cpu9:73104584)BC: 5002: Failed to flush 1 buffers of size 8192 each for object ‘host-28-hb’ b00f 48 c599f9b9 80e71816 baf 0 1829298 0 0 1 f26e 54070bfa2083d 0 0: No connection
2020-07-19T06:38:41.753Z cpu42:82717021)WARNING: NFS: 338: Lost connection to the server 192.168.5.2 mount point /VDI-Desktops5, mounted as c599f9b9-80e71816-0000-000000000000 (“VDI-Desktops5”)
2020-07-19T06:39:04.844Z cpu55:1480664)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :No connection
2020-07-19T06:39:04.844Z cpu55:1480664)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :No connection
2020-07-19T06:39:04.845Z cpu55:1480664)NFS: 2264: Failed to get object 48 c599f9b9 80e71816 baf 0 1 0 0 1 f26e 54070bfa2083d 0 0 :No connection

 

Conclusion:

 

  • From the logs we can see that the connection to the NFS Server failed with the Error: Lost Connection. Which generally as per the article: https://kb.vmware.com/s/article/1007352 could be associated with the Connectivity issue.

 

  • Please check with the NFS Vendor to check if they are able to see any reason for failure from the NFS End because this has effected multiple ESXi server so it cannot be Host related.

 

Action Plan:

 

 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply