RCA-8: NFS Datastore Disconnection in vCenter Infra


 

Vmkernel Logs:

 

    • From the Logs we can see that the device went to APD State (All Paths Down) and then Lost Connection:

 

2019-02-19T03:06:30.347Z cpu24:26332219)StorageApdHandler: 1204: APD start for 0x430d4e3f80d0 [70c4f158-64880d41]
2019-02-19T03:06:30.347Z cpu0:33270)StorageApdHandler: 421: APD start event for 0x430d4e3f80d0 [70c4f158-64880d41]
2019-02-19T03:06:30.347Z cpu0:33270)StorageApdHandlerEv: 110: Device or filesystem with identifier [70c4f158-64880d41] has entered the All Paths Down state.

2019-02-19T03:06:50.029Z cpu11:34522)WARNING: NFSLock: 2210: File is being locked by a consumer on host r****9-vif071-04.ci***.com with exclusive lock.
2019-02-19T03:06:50.029Z cpu11:34522)NFSLock: 2966: failed to get lock on file slotsfile 0x43064da8ab00 on r***9-c01-prod-miesx059 (10.31.64.103): Busy
2019-02-19T03:07:54.362Z cpu37:26338354)WARNING: NFS: 338: Lost connection to the server r***9-c01-prod-miesx078 mount point /miesx078/miXXX078_vol001, mounted as 70c4f158-64880d41-0000-000000000000 (“r***_vif42_03”)

 

2019-02-19T03:09:25.368Z cpu3:35499)WARNING: NFS: 2208: Failed to get attributes (No connection)
2019-02-19T03:09:25.368Z cpu3:35499)NFS: 2264: Failed to get object 60 70c4f158 64880d41 1 80733cf7 0 40 1d80b22b 80733cf7 4000000000 401d80b22b 0 0 :No connection

 

VOBD Logs:

 

    • From the VOBD Logs we can see that not just the NFS got disconnected form the Host:

 

2019-02-19T03:06:30.347Z: [APDCorrelator] 32300396546112us: [vob.storage.apd.start] Device or filesystem with identifier [70c4f158-64880d41] has entered the All Paths Down state.
2019-02-19T03:06:30.347Z: [APDCorrelator] 32300966469209us: [esx.problem.storage.apd.start] Device or filesystem with identifier [70c4f158-64880d41] has entered the All Paths Down state.
2019-02-19T03:07:54.363Z: [vmfsCorrelator] 32301050484595us: [esx.problem.vmfs.nfs.server.disconnect] r***9-c01-prod-miesx078 /miesx078/miXXX078_vol001 70c4f158-64880d41-0000-000000000000 r***_vif42_03
2019-02-19T03:08:50.352Z: [APDCorrelator] 32300536548159us: [vob.storage.apd.timeout] Device or filesystem with identifier [70c4f158-64880d41] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2019-02-19T03:08:50.352Z: [APDCorrelator] 32301106473695us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [70c4f158-64880d41] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

 

 

 

    • After which they were established back as well as the NFS Datastore:

 

2018-11-19T03:32:35.616Z: [vmfsCorrelator] 24353731737407us: [esx.problem.vmfs.nfs.server.restored] 10.31.64.37 /miesx012/miesx012_srm_hfm_1 85231c36-1cf1ecf0-0000-000000000000 HFM_PRD_R***9_SRM_cdot
2018-11-19T03:38:44.587Z: [vmfsCorrelator] 24354100708711us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx009 /miesx009/miesx009_R***_SODC_VIF_H1 2d875bf2-f9ca6e08-0000-000000000000 R***-SODC-VIF-H1_New
2018-11-19T03:43:25.212Z: [vmfsCorrelator] 24354381333533us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx009 /miesx009/miesx009_R***_SODC_VIF_H2 287fc75d-de5e2459-0000-000000000000 R***-SODC-VIF-H2_New
2018-11-19T03:46:50.434Z: [vmfsCorrelator] 24354586555935us: [esx.problem.vmfs.nfs.server.restored] 10.31.64.38 /miesx013/miesx013_srm_ifp_1 1dd12cde-349caa46-0000-000000000000 r***9_vif42_srm_ifp_1_cdot
2018-11-19T03:49:58.733Z: [vmfsCorrelator] 24354774854816us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx009 /miesx009/miesx009_IWE_SRM 2c5f96c2-907d77ac-0000-000000000000 IWE_SRM_SODC_CDOT
2018-11-19T03:53:39.667Z: [vmfsCorrelator] 24354995789163us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx076 /miesx076/miesx076_vol001 64f5c673-d51aa00f-0000-000000000000 r***_vif11_04
2018-11-19T03:56:55.904Z: [vmfsCorrelator] 24355192025508us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx061 /miesx061/miesx061_vol001 e3dbf413-bbad0a36-0000-000000000000 r***_vif12_01
2018-11-19T03:58:55.372Z: [vmfsCorrelator] 24355311493443us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx077 /miesx077/miesx077_vol001 4a367e56-6def379f-0000-000000000000 r***_vif12_03
2018-11-19T04:03:51.263Z: [vmfsCorrelator] 24355607385063us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx065 /miesx065/miesx065_vol003 2b73fd1f-46ce9d7d-0000-000000000000 r***_vif12_1
2018-11-19T04:08:07.137Z: [vmfsCorrelator] 24355863259096us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx062 /miesx062/miesx062_vol001 a9acb722-2c1c7e92-0000-000000000000 r***_vif42_02
2018-11-19T04:11:27.815Z: [vmfsCorrelator] 24356063936412us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx078 /miesx078/miXXX078_vol001 70c4f158-64880d41-0000-000000000000 r***_vif42_03
2018-11-19T04:14:34.220Z: [vmfsCorrelator] 24356250342226us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx061 /miesx061/miesx061_vol002 16744781-7046adc8-0000-000000000000 r***_vif43_1
2018-11-19T04:16:30.679Z: [vmfsCorrelator] 24356366801226us: [esx.problem.vmfs.nfs.server.restored] r***9-c01-prod-miesx060 /miesx060/miesx060_vol002 8f6064fb-b374256b-0000-000000000000 r***_vif43_2

 

 

Hostd:

 

2019-02-19T03:07:54.365Z info hostd[77E8DB70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 35130 : Lost connection to server r***9-c01-prod-miesx078 mount point /miesx078/miXXX078_vol001 mounted as 70c4f158-64880d41-0000-000000000000 (r***_vif42_03).

2019-02-19T03:08:50.353Z info hostd[77E8DB70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 35131 : Device or filesystem with identifier 70c4f158-64880d41 has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2019-02-19T03:08:51.363Z warning hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/cxm-***-001-p-n/cxm-***-001-p-n.vmx] UpdateStorageAccessibilityStatusInt: The datastore r***9-c01-prod-miesx078:/miesx078/miXXX078_vol001 is not accessible
2019-02-19T03:08:51.364Z info hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/cxm-***-001-p-n/cxm-***-001-p-n.vmx] VM config backing gone — marking VM _postAct to MARKBADLOAD.
2019-02-19T03:08:51.370Z warning hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/cssere****-prd-29/cssere****-prd-29.vmx] UpdateStorageAccessibilityStatusInt: The datastore r***9-c01-prod-miesx078:/miesx078/miXXX078_vol001 is not accessible
2019-02-19T03:08:51.370Z info hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/cssere****-prd-29/cssere****-prd-29.vmx] VM config backing gone — marking VM _postAct to MARKBADLOAD.
2019-02-19T03:08:51.373Z warning hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/icam-prod-ms-10/icam-prod-ms-10.vmx] UpdateStorageAccessibilityStatusInt: The datastore r***9-c01-prod-miesx078:/miesx078/miXXX078_vol001 is not accessible
2019-02-19T03:08:51.373Z info hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/icam-prod-ms-10/icam-prod-ms-10.vmx] VM config backing gone — marking VM _postAct to MARKBADLOAD.
2019-02-19T03:08:51.374Z warning hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/ud-r***-prd-10/ud-r***-prd-10.vmx] UpdateStorageAccessibilityStatusInt: The datastore r***9-c01-prod-miesx078:/miesx078/miXXX078_vol001 is not accessible
2019-02-19T03:08:51.374Z info hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/ud-r***-prd-10/ud-r***-prd-10.vmx] VM config backing gone — marking VM _postAct to MARKBADLOAD.
2019-02-19T03:08:51.376Z warning hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/sensu-prd-r***-01/sensu-prd-r***-01.vmx] UpdateStorageAccessibilityStatusInt: The datastore r***9-c01-prod-miesx078:/miesx078/miXXX078_vol001 is not accessible
2019-02-19T03:08:51.376Z info hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/sensu-prd-r***-01/sensu-prd-r***-01.vmx] VM config backing gone — marking VM _postAct to MARKBADLOAD.
2019-02-19T03:08:51.383Z warning hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/ud-***-prd1-22/ud-***-prd1-22.vmx] UpdateStorageAccessibilityStatusInt: The datastore r***9-c01-prod-miesx078:/miesx078/miXXX078_vol001 is not accessible
2019-02-19T03:08:51.383Z info hostd[7718CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/ud-***-prd1-22/ud-***-prd1-22.vmx] VM config backing gone — marking VM _postAct to MARKBADLOAD.

2019-02-19T03:09:25.372Z info hostd[77803B70] [Originator@6876 sub=DiskLib] DISKLIB-DSCPTR: DescriptorOpenInt: failed to open ‘/vmfs/volumes/70c4f158-64880d41/cxm-***-001-p-n/cxm-***-001-p-n.vmdk’: Input/output error (50004)
2019-02-19T03:09:25.372Z info hostd[77803B70] [Originator@6876 sub=DiskLib] DISKLIB-LINK  : “/vmfs/volumes/70c4f158-64880d41/cxm-***-001-p-n/cxm-***-001-p-n.vmdk” : failed to open (Input/output error).
2019-02-19T03:09:25.372Z info hostd[77803B70] [Originator@6876 sub=DiskLib] DISKLIB-CHAIN : “/vmfs/volumes/70c4f158-64880d41/cxm-***-001-p-n/cxm-***-001-p-n.vmdk” : failed to open (Input/output error).
2019-02-19T03:09:25.372Z info hostd[77803B70] [Originator@6876 sub=DiskLib] DISKLIB-LIB   : Failed to open ‘/vmfs/volumes/70c4f158-64880d41/cxm-***-001-p-n/cxm-***-001-p-n.vmdk’ with flags 0x110015 Input/output error (327689).
2019-02-19T03:09:25.372Z info hostd[77803B70] [Originator@6876 sub=DiskLib] DISKLIB-LIB_MISC   : DiskLibEnumExtents: failed to open ‘/vmfs/volumes/70c4f158-64880d41/cxm-***-001-p-n/cxm-***-001-p-n.vmdk’: Input/output error (327689).
2019-02-19T03:09:25.372Z error hostd[77803B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/cxm-***-001-p-n/cxm-***-001-p-n.vmx] Failed to enumerate disk [r***_vif42_03] cxm-***-001-p-n/cxm-***-001-p-n.vmdk, Input/output error
2019-02-19T03:09:25.375Z warning hostd[77803B70] [Originator@6876 sub=Default] FileLayoutExToFileLayout: While populating disk layoutSkipping empty diskLayout configuration for VM [r***_vif42_03] cxm-***-001-p-n/cxm-***-001-p-n.vmx, key 2000
2019-02-19T03:09:25.375Z warning hostd[77803B70] [Originator@6876 sub=Default] FileLayoutExToFileLayout: While populating disk layoutSkipping empty diskLayout configuration for VM [r***_vif42_03] cxm-***-001-p-n/cxm-***-001-p-n.vmx, key 2001
2019-02-19T03:09:33.837Z warning hostd[78080B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/70c4f158-64880d41/sensu-prd-r***-01/sensu-prd-r***-01.vmx] Failed to find activation record, event user unknown.

 

Conclusion:

    • From the Logs we can only see that the NFS Datastore was disconnected, but the processes didn’t properly released the handle and then was not able to properly connect back and due to which we were seeing the error.
    • We are still in process of investigating this issue from the Dump that we have collected. We will let you know the status as soon as we will be able to review the dump file.

 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply