Datastore Name: ABCD-CO2-IN
Last 4 Digits of the NAA ID: 3d1f
Time of Issue: 23rd March : Morning 1:45 – 2:00 AM SGT
Time in GMT: 22nd March : 5:45 PM – 6:00 PM
GMT
Hostname: abcdgate801
Uptime: 932.4 days
ESXi Version: ESXi 6.5 U1g
VOBD Logs:
- From the
VOBD Logs we are not able to see much details other than the access
Events.
2021-03-22T17:47:30.513Z: [scsiCorrelator]
80421623249027us: [vob.scsi.device.state.off] Device naa.60060e80122e180050402e180000abcd has been
turned off administratively.
2021-03-22T17:47:30.513Z: [scsiCorrelator] 80423193527311us:
[esx.problem.scsi.device.state.off] Device: naa.60060e80122e180050402e180000abcd has been turned off administratively
2021-03-22T17:53:02.678Z: [scsiCorrelator]
80421955407455us: [vob.scsi.device.state.on] Device naa.60060e80122e180050402e180000abcd has been turned on administratively.
2021-03-22T17:53:02.678Z: [scsiCorrelator] 80423525692424us:
[esx.clear.scsi.device.state.on] Device: naa.60060e80122e180050402e180000abcd has been turned on administratively
- Around 19:01 which is
after 1 Hour we can see the Datastore mounted in RW Mode:
2021-03-22T19:01:38.277Z: [vmfsCorrelator]
80426070924700us: [vob.vmfs.volume.mounted] File system ‘[snap-2b898787-ABCD-CO2-INTERNAL-02-R-038,
6058e98f-c6fc4ac0-364c-0025b51501d4]’ (LV
snap-0486436f-5858a163-b78f9844-939e-0025b50100ec) mounted in ‘rw’ mode.
2021-03-22T19:01:38.277Z:
[vmfsCorrelator] 80427641292052us: [esx.audit.vmfs.volume.mounted] File system
[snap-2b898787-ABCD-CO2-INTERNAL-02-R-038,
6058e98f-c6fc4ac0-364c-0025b51501d4] on volume
snap-0486436f-5858a163-b78f9844-939e-0025b50100ec has been mounted in ‘rw’
mode. The datastore is now accessible on this host.
2021-03-22T22:16:33.184Z:
[vmfsCorrelator] 80437765598028us: [vob.vmfs.volume.umounted] File system ‘[ABCD-CO2-INTERNAL-02-R-038,
6058e98f-c6fc4ac0-364c-0025b51501d4]’ (LV
snap-0486436f-5858a163-b78f9844-939e-0025b50100ec) un-mounted.
2021-03-22T22:16:33.184Z:
[vmfsCorrelator] 80439336198569us: [esx.audit.vmfs.volume.umounted] File system
[ABCD-CO2-INTERNAL-02-R-038, 6058e98f-c6fc4ac0-364c-0025b51501d4] on
volume snap-0486436f-5858a163-b78f9844-939e-0025b50100ec has been safely
un-mounted. The datastore is no longer accessible on this host.
2021-03-22T22:32:11.513Z:
[vmfsCorrelator] 80438703907997us: [vob.vmfs.volume.umounted] File system
‘[snap-571c4ad3-PURE-NCCO2-INT-APP-PRD-WIN-01-R-019,
6058dd1f-d7615743-777c-0025b5150145]’ (LV
snap-2b2ab231-59dad75a-781a4886-037c-0025b51801d7) un-mounted.
2021-03-22T22:32:11.513Z:
[vmfsCorrelator] 80440274527312us: [esx.audit.vmfs.volume.umounted] File system
[snap-571c4ad3-PURE-NCCO2-INT-APP-PRD-WIN-01-R-019,
6058dd1f-d7615743-777c-0025b5150145] on volume
snap-2b2ab231-59dad75a-781a4886-037c-0025b51801d7 has been safely un-mounted.
The datastore is no longer accessible on this host.
VPXA Logs:
- Don’t
have the logs for the time of issue:
2021-03-23T11:52:45.634Z verbose vpxa[7DE6B70]
[Originator@6876 sub=VpxaHalCnxHostagent opID=WFU-7d60e389] Completed
WaitForUpdatesDone callback
2018-09-03T22:02:51.473Z Section for VMware ESX, pid=67571, version=6.5.0,
build=7967591, option=Release
2021-03-23T11:52:47.585Z verbose vpxa[5E2AB70] [Originator@6876
sub=VpxaHalCnxHostagent opID=WFU-63fa77ad] Received WaitForUpdatesDone callback
VMKernel Logs:
- In the
Vmkernel Logs we can see below that the Datastore is detected as a
Snapshot which is expected as we are mounted a snapshot LUN
2021-03-22T17:42:09.798Z cpu0:68084 opID=b2e49de0)LVM:
11138: Device naa.60060e80122e180050402e180000abcd:1 detected to be a
snapshot:
2021-03-22T17:47:30.512Z
cpu6:68079 opID=d11cee23)World: 12235: VC opID
kls7slhj-132yv-h5:70048203-8a-6b-e0c5 maps to vmkernel opID d11cee23
2021-03-22T17:47:30.512Z
cpu6:68079 opID=d11cee23)ScsiDevice: 1860: Device naa.60060e80122e180050402e180000abcd has been turned off administratively.
2021-03-22T17:53:02.670Z cpu31:68086
opID=86282f39)World: 12235: VC opID kls7slhj-1345v-h5:70048400-15-af-e953 maps
to vmkernel opID 86282f39
2021-03-22T17:53:02.670Z
cpu31:68086 opID=86282f39)VAAI_FILTER: VaaiFilterClaimDevice:270: Attached vaai
filter (vaaip:VMW_VAAIP_HDS) to logical device ‘naa.60060e80122e180050402e180000abcd‘
2021-03-22T17:53:02.675Z
cpu28:68086 opID=86282f39)LVM: 11138: Device naa.60060e80122e180050402e180000abcd:1
detected to be a snapshot:
2021-03-22T17:53:02.675Z
cpu28:68086 opID=86282f39)LVM: 11145:
queried disk ID: <type 2, len 22, lun 24, devType 0, scsi 0, h(id)
16374344479590390527>
2021-03-22T17:53:02.675Z
cpu28:68086 opID=86282f39)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 32, devType 0, scsi 0, h(id)
6771443636113600371>
2021-03-22T17:53:02.678Z
cpu28:68086 opID=86282f39)ScsiDevice: 1867: Device naa.60060e80122e180050402e180000abcd has been turned on administratively.
2021-03-22T17:56:49.149Z cpu29:68083
opID=e1f149bb)World: 12235: VC opID kls7slhj-134z3-h5:70048607-d2-35-eee3 maps
to vmkernel opID e1f149bb
2021-03-22T17:56:49.149Z
cpu29:68083 opID=e1f149bb)LVM: 11138: Device naa.60060e80122e180050402e180000abcd:1
detected to be a snapshot:
2021-03-22T17:56:49.149Z
cpu29:68083 opID=e1f149bb)LVM: 11145:
queried disk ID: <type 2, len 22, lun 24, devType 0, scsi 0, h(id)
16374344479590390527>
2021-03-22T17:56:49.149Z
cpu29:68083 opID=e1f149bb)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 32, devType 0, scsi 0, h(id)
6771443636113600371>
2021-03-22T17:56:49.176Z
cpu30:68083 opID=e1f149bb)LVM: 11138: Device
naa.624a9370b741645ba7134008002a5e3f:1 detected to be a snapshot:
2021-03-22T17:56:49.176Z
cpu30:68083 opID=e1f149bb)LVM: 11145:
queried disk ID: <type 2, len 22, lun 254, devType 0, scsi 0, h(id)
5976808552728752976>
2021-03-22T17:56:49.176Z
cpu30:68083 opID=e1f149bb)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 19, devType 0, scsi 0, h(id)
188742552221750601>
2021-03-22T17:56:49.204Z
cpu30:68083 opID=e1f149bb)LVM: 11138: Device
naa.60060e8007c7c7000030c7c70000c001:1 detected to be a snapshot:
2021-03-22T17:56:49.204Z
cpu30:68083 opID=e1f149bb)LVM: 11145:
queried disk ID: <type 2, len 22, lun 3, devType 0, scsi 0, h(id)
17977420854024958367>
2021-03-22T17:56:49.204Z
cpu30:68083 opID=e1f149bb)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 47, devType 0, scsi 0, h(id)
1133961160311467624>
2021-03-22T17:56:49.206Z
cpu30:68083 opID=e1f149bb)LVM: 11138: Device
naa.60060e80122e180050402e1800003d28:1 detected to be a snapshot:
2021-03-22T17:56:49.206Z
cpu30:68083 opID=e1f149bb)LVM: 11145:
queried disk ID: <type 2, len 22, lun 14, devType 0, scsi 0, h(id)
17431598461715629954>
2021-03-22T17:56:49.206Z
cpu30:68083 opID=e1f149bb)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 5, devType 0, scsi 0, h(id)
6799844613737751893>
2021-03-22T17:56:49.296Z
cpu18:68083 opID=e1f149bb)LVM: 11138: Device naa.624a9370b741645ba7134008002aabcd:1
detected to be a snapshot:
2021-03-22T17:56:49.296Z
cpu18:68083 opID=e1f149bb)LVM: 11145:
queried disk ID: <type 2, len 22, lun 253, devType 0, scsi 0, h(id)
12879439484924876419>
2021-03-22T17:56:49.296Z
cpu18:68083 opID=e1f149bb)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 21, devType 0, scsi 0, h(id)
18411845478220440408>
2021-03-22T17:56:49.854Z
cpu16:68083 opID=e1f149bb)VC: 4511: Device rescan time 326 msec (total number
of devices 87)
2021-03-22T17:56:49.854Z
cpu16:68083 opID=e1f149bb)VC: 4514: Filesystem probe time 514 msec (devices
probed 67 of 87)
2021-03-22T17:56:49.854Z
cpu16:68083 opID=e1f149bb)VC: 4516: Refresh open volume time 20 msec
- However
while it is being Mounted we can see the SCSI failure happening on the
Datastore with the Sense Data: 0x7
0x22 0x0.
2021-03-22T18:00:01.064Z cpu27:18384909)NMP:
nmp_ThrottleLogForDevice:3647: Cmd 0xc0 (0x439d80aa0740, 5560042) to dev
“naa.60060e80122e180050402e180000abcd” on path
“vmhba1:C0:T3:L24” Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x7
0x22 0x0. Act:NONE
2021-03-22T18:00:01.064Z
cpu27:18384909)ScsiDeviceIO: 2962: Cmd(0x439d80a151c0) 0xfe, CmdSN 0x2 from
world 5560042 to dev “naa.60060e80122e180050402e180000abcd”
failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x7 0x22 0x0.
- Post
which we can see the Failure happening with the I/O Error:
2021-03-22T18:00:01.064Z cpu26:5560042
opID=1840e305)World: 12235: VC opID kls7slhj-135bm-h5:70048646-e-f7-f3af maps
to vmkernel opID 1840e305
2021-03-22T18:00:01.064Z
cpu26:5560042 opID=1840e305)LVM: 6493: Forcing APD unregistration of devID
5858a163-de77b521-7267-0025b50100ec in state 1.
2021-03-22T18:00:01.064Z cpu26:5560042 opID=1840e305)LVM: 15406: Failed
to open device naa.60060e80122e180050402e180000abcd:1 (0): I/O error
2021-03-22T18:11:44.184Z
cpu16:68079 opID=4cdd5014)LVM: 11138: Device naa.60060e80122e180050402e180000abcd:1
detected to be a snapshot:
2021-03-22T18:59:37.981Z
cpu3:68039 opID=d77eb588)LVM: 11138: Device naa.60060e80122e180050402e180000abcd:1
detected to be a snapshot:
Hostd Logs:
- Reviewed
the Hostd logs, but the logs are rolled over.
2021-03-23T05:40:43.344Z verbose hostd[15B80B70]
[Originator@6876 sub=PropertyProvider opID=d1731731 user=root] RecordOp ASSIGN:
info, haTask–vim.AuthorizationManager.retrieveAllPermissions-310855166.
Applied change to temp map.
2021-03-23T05:40:43.344Z verbose hostd[15B80B70] [Originator@6876
sub=PropertyProvider opID=d1731731 user=root] RecordOp ASSIGN: info,
haTask–vim.AuthorizationManager.retrieveAllPermissions-310855166. Applied
change to temp map.
2018-09-03T22:02:48.733Z Section for VMware ESX, pid=67272, version=6.5.0,
build=7967591, option=Release
Hostname: abcdgate803
Uptime: 932.5 days
ESXi Version: ESXi 6.5 U1g
VOBD Logs:
- Same
thing can be seen in the VOBD Logs of this ESXi Host, the Datastore Access
events:
2021-03-22T17:49:18.005Z: [scsiCorrelator]
80431522949141us: [vob.scsi.device.state.off] Device naa.60060e80122e180050402e180000abcd has been turned off administratively.
2021-03-22T17:49:18.005Z: [scsiCorrelator] 80432554955638us:
[esx.problem.scsi.device.state.off] Device: naa.60060e80122e180050402e180000abcd has been turned off administratively
2021-03-22T17:54:34.779Z: [scsiCorrelator]
80431839719871us: [vob.scsi.device.state.on] Device naa.60060e80122e180050402e180000abcd has been turned on administratively.
2021-03-22T17:54:34.780Z: [scsiCorrelator] 80432871730384us:
[esx.clear.scsi.device.state.on] Device: naa.60060e80122e180050402e180000abcd has been turned on administratively
VPXA Logs:
- Logs are
rolled over:
2021-03-23T14:43:47.405Z info vpxa[88C6B70]
[Originator@6876 sub=vpxLro opID=43b8fa28-96] [VpxLRO] — FINISH lro-8734433
2021-03-23T14:43:47.418Z info vpxa[87A4B70] [Originator@6876 sub=vpxLro
opID=4bc50c0a-4d] [VpxLRO] — BEGIN lro-8734434 — vpxa —
vpxapi.VpxaService.queryBatchPerformanceStatistics —
525da189-75a9-8f39-3ea7-e5cf18304310
VMKernel Logs:
- In the
VMKernel Logs we can see the same set of events for the SCSI Failure
happening with 0x7 0x22 0x0.
2021-03-22T17:38:27.385Z cpu12:18301638)NMP:
nmp_ThrottleLogForDevice:3647: Cmd 0xc0 (0x43958da59d00, 1936130) to dev
“naa.60060e80122e180050402e180000abcd” on path
“vmhba0:C0:T2:L24” Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x7
0x22 0x0. Act:NONE
2021-03-22T17:38:27.385Z
cpu12:18301638)ScsiDeviceIO: 2962: Cmd(0x4395811b2140) 0xfe, CmdSN 0x2 from
world 1936130 to dev “naa.60060e80122e180050402e180000abcd”
failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x7 0x22 0x0.
2021-03-22T17:38:27.385Z cpu4:1936130
opID=a920ee53)World: 12235: VC opID
AddDatastoreWizard-add-685054-ngc:70037516-9b-8a-933a maps to vmkernel opID
a920ee53
2021-03-22T17:38:27.385Z
cpu4:1936130 opID=a920ee53)LVM: 6493: Forcing APD unregistration of devID
5858a163-de77b521-7267-0025b50100ec in state 1.
2021-03-22T17:38:27.385Z cpu4:1936130 opID=a920ee53)LVM: 15406: Failed
to open device naa.60060e80122e180050402e180000abcd:1 (0): I/O error
2021-03-22T17:42:09.900Z
cpu6:68000 opID=c7f8b799)World: 12235: VC opID kls7slhj-131z2-h5:70047937-3a-83-99d4
maps to vmkernel opID c7f8b799
2021-03-22T17:42:09.900Z
cpu6:68000 opID=c7f8b799)LVM: 11138: Device naa.60060e80122e180050402e180000abcd:1
detected to be a snapshot:
2021-03-22T17:42:09.900Z
cpu6:68000 opID=c7f8b799)LVM: 11145:
queried disk ID: <type 2, len 22, lun 24, devType 0, scsi 0, h(id)
16374344479590390527>
2021-03-22T17:42:09.900Z
cpu6:68000 opID=c7f8b799)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 32, devType 0, scsi 0, h(id)
6771443636113600371>
2021-03-22T17:42:09.917Z
cpu6:68000 opID=c7f8b799)LVM: 11138: Device naa.624a9370b741645ba7134008002aabcd:1
detected to be a snapshot:
2021-03-22T17:42:09.917Z
cpu6:68000 opID=c7f8b799)LVM: 11145:
queried disk ID: <type 2, len 22, lun 253, devType 0, scsi 0, h(id)
12879439484924876419>
2021-03-22T17:42:09.917Z
cpu6:68000 opID=c7f8b799)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 21, devType 0, scsi 0, h(id)
18411845478220440408>
2021-03-22T17:42:09.939Z
cpu6:68000 opID=c7f8b799)LVM: 11138: Device
naa.624a9370b741645ba7134008002a5e3f:1 detected to be a snapshot:
2021-03-22T17:42:09.939Z
cpu6:68000 opID=c7f8b799)LVM: 11145:
queried disk ID: <type 2, len 22, lun 254, devType 0, scsi 0, h(id)
5976808552728752976>
2021-03-22T17:42:09.939Z
cpu6:68000 opID=c7f8b799)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 19, devType 0, scsi 0, h(id)
188742552221750601>
2021-03-22T17:42:09.978Z
cpu3:68000 opID=c7f8b799)LVM: 11138: Device
naa.60060e8007c7c7000030c7c70000c001:1 detected to be a snapshot:
2021-03-22T17:42:09.978Z
cpu3:68000 opID=c7f8b799)LVM: 11145:
queried disk ID: <type 2, len 22, lun 3, devType 0, scsi 0, h(id)
17977420854024958367>
2021-03-22T17:42:09.978Z
cpu3:68000 opID=c7f8b799)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 47, devType 0, scsi 0, h(id)
1133961160311467624>
2021-03-22T17:42:09.982Z
cpu3:68000 opID=c7f8b799)LVM: 11138: Device
naa.60060e80122e180050402e1800003d28:1 detected to be a snapshot:
2021-03-22T17:42:09.982Z
cpu3:68000 opID=c7f8b799)LVM: 11145:
queried disk ID: <type 2, len 22, lun 14, devType 0, scsi 0, h(id)
17431598461715629954>
2021-03-22T17:42:09.982Z
cpu3:68000 opID=c7f8b799)LVM: 11152:
on-disk disk ID: <type 2, len 22, lun 5, devType 0, scsi 0, h(id)
6799844613737751893>
2021-03-22T17:42:10.811Z
cpu10:68000 opID=c7f8b799)VC: 4511: Device rescan time 308 msec (total number
of devices 87)
2021-03-22T17:42:10.811Z
cpu10:68000 opID=c7f8b799)VC: 4514: Filesystem probe time 734 msec (devices
probed 55 of 87)
2021-03-22T17:42:10.811Z
cpu10:68000 opID=c7f8b799)VC: 4516: Refresh open volume time 14 msec
Conclusion:
- Based on
the logs we can conclude that the datastore mount operation failed on the
Datastore with the SCSI Error: H:0x0 D:0x2 P:0x0 Valid
sense data: 0x7 0x22 0x0. Act:NONE
Which means:
Type |
Code |
Name |
Description |
Host Status |
[0x0] |
OK |
This status is returned when
there is no error on the host side. This is when you will see if there is a
status for a Device or Plugin. It is also when you will see Valid sense data
instead of Possible sense Data. |
Device Status |
[0x2] |
CHECK_CONDITION |
This status is returned when a
command fails for a specific reason. When a CHECK CONDITION is received, the
ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order
to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier,
and other bits). The sense data is listed after Valid sense data in the order
of Sense Key, Additional Sense Code, and ASC Qualifier. |
Plugin Status |
[0x0] |
GOOD |
No error. (ESXi 5.x / 6.x only) |
Sense Key |
[0x7] |
DATA PROTECT |
|
Additional Sense Data |
22/00 |
ILLEGAL FUNCTION (USE 20 00, 24
00, OR 26 00) |
|
- We can
see that the resignaturing failed with I/O error because of the DATA
PROTECT error from the array side.
- During
the time of mount the Data Protection Operation might cause the Datastore
to fail and it could be because of the either LUN not being released
properly post the replication was broken or write protection being
enabled.
- Next time
when the Datastore is mount with rw we can see that the Datastore is
mounted:
2021-03-22T19:01:38.277Z: [vmfsCorrelator]
80426070924700us: [vob.vmfs.volume.mounted] File system ‘[snap-2b898787-ABCD-CO2-INTERNAL-02-R-038,
6058e98f-c6fc4ac0-364c-0025b51501d4]’ (LV
snap-0486436f-5858a163-b78f9844-939e-0025b50100ec) mounted in ‘rw’ mode.
2021-03-22T19:01:38.277Z:
[vmfsCorrelator] 80427641292052us: [esx.audit.vmfs.volume.mounted] File system
[snap-2b898787-ABCD-CO2-INTERNAL-02-R-038,
6058e98f-c6fc4ac0-364c-0025b51501d4] on volume
snap-0486436f-5858a163-b78f9844-939e-0025b50100ec has been mounted in ‘rw’
mode. The datastore is now accessible on this host.
Action Plan:
- Please
investigate this from the Storage end if you are able to see any
difference between the way it was mounted earlier to the next mount
operation which tend to fix the issue.
- Please
check if there is any Lock which was not able to release during the break
of replication which could be the reason for this issue.
- Please
check if you are running on the firmware version 4.0(1j) as the current
version of driver is fnic version 1.6.0.37
- From https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=37358&deviceCategory=io&details=1&VID=1137&DID=0045&SVID=1137&SSID=012c&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc>