Hostname: abcesxpap41
VMware ESXi 6.0.0 build-9313334 ESXi 6.0 EP 15
Time: August 27, 2019,
at 08:06 am to 08:09 am HKT
Time in GMT: 12:06 AM –
12:09 AM GMT
vmhba |
driver |
driver
version |
VID |
DID |
SVID |
SDID |
model |
vmhba0 |
hpsa |
6.0.0.124-1OEM |
103c |
323b |
103c |
3355 |
Hewlett Packard Enterprise Smart Array P220i |
vmhba1 |
qlnativefc |
2.1.81.0-1OEM.600.0.0.2768847 |
1077 |
2532 |
103c |
338e |
QLogic Corp ISP2532-based 8Gb Fibre Channel to
PCI Express HBA |
vmhba2 |
qlnativefc |
2.1.81.0-1OEM.600.0.0.2768847 |
1077 |
2532 |
103c |
338e |
QLogic Corp ISP2532-based 8Gb Fibre Channel to
PCI Express HBA |
vmhba3 |
qlnativefc |
2.1.81.0-1OEM.600.0.0.2768847 |
1077 |
2532 |
103c |
338e |
QLogic Corp ISP2532-based 8Gb Fibre Channel to
PCI Express HBA |
vmhba4 |
qlnativefc |
2.1.81.0-1OEM.600.0.0.2768847 |
1077 |
2532 |
103c |
338e |
QLogic Corp ISP2532-based 8Gb Fibre Channel to
PCI Express HBA |
Adapter |
Driver |
Good CMDS |
Failed CMDS |
<– % |
MB read |
Failed MB
read |
<– % |
MB written |
Failed MB
written |
<– % |
Res. ops |
Failed res.
ops |
Res.
conflicts |
vmhba0 |
hpsa |
13408315 |
14665 |
0.10% |
169284.5 |
0 |
0.00% |
44278.84 |
0 |
0.00% |
48580 |
0 |
0 |
vmhba1 |
qlnativefc |
90100562 |
1910108 |
2.10% |
753961.6 |
0 |
0.00% |
199789.5 |
0 |
0.00% |
0 |
0 |
0 |
vmhba2 |
qlnativefc |
90051489 |
8285333 |
8.40% |
753292.2 |
0 |
0.00% |
200414.9 |
0 |
0.00% |
0 |
0 |
0 |
vmhba3 |
qlnativefc |
90040104 |
27330240 |
23.30% |
753592.8 |
0 |
0.00% |
199704.4 |
0 |
0.00% |
0 |
0 |
0 |
vmhba4 |
qlnativefc |
90051269 |
20862765 |
18.80% |
753207.9 |
0 |
0.00% |
200076.3 |
0 |
0.00% |
0 |
0 |
0 |
vmhba32 |
bnx2i |
0 |
0 |
– |
0 |
0 |
– |
0 |
0 |
– |
0 |
0 |
0 |
vmhba33 |
bnx2i |
0 |
0 |
– |
0 |
0 |
– |
0 |
0 |
– |
0 |
0 |
0 |
the |
Rx |
Tx |
Rx |
Rx |
Tx |
Tx |
Rx |
Tx |
Rx |
Tx |
|
|
Rx |
Rx |
Rx |
Rx |
Rx |
Rx |
Tx |
Tx |
Rx |
vmnic |
Pkts |
Pkts |
Bytes |
GB |
Bytes |
GB |
Err |
Err |
Drop |
Drop |
M-cast |
Coll |
Len Err |
Ovr Err |
CRC Err |
Frm Err |
Fifo Err |
Miss Err |
Abt Err |
Carr Err |
Total Err |
vmnic0 |
3460168368 |
1050184023 |
2.7997E+12 |
260740.00% |
4.87E+12 |
4532 |
1026000.00% |
0 |
0 |
0.00% |
930551239 |
0 |
1161 |
0 |
0 |
0 |
9099 |
0 |
0 |
0 |
0.00% |
vmnic1 |
6749656176 |
9275562324 |
6.72775E+11 |
62660.00% |
6.12E+12 |
5697.3 |
929600.00% |
0 |
0 |
0.00% |
906003972 |
0 |
1161 |
0 |
0 |
0 |
8135 |
0 |
0 |
0 |
0.00% |
VMKernel Logs:
- From the vmkernel logs we can see that the Heartbeat of the
Datastore Timed out.
2019-08-27T00:06:06.100Z cpu25:33495)IntrCookie:
3429: Interrupt received on invalid vector (cpu 25, vector 76); ignoring it.
2019-08-27T00:06:17.041Z cpu29:108949)HBX: 2851: ‘DS_Postpaid_Cluster_APPS_1’:
HB at offset 3670016 – Waiting for timed out HB:
2019-08-27T00:06:17.041Z cpu29:108949)
[HB state abcdef02 offset 3670016 gen 33 stampUS 1696332028433 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 1656800> drv 14.61 lockImpl
4]
2019-08-27T00:06:17.041Z cpu28:108954)HBX: 2851: ‘DS_Postpaid_Cluster_APPS_2’:
HB at offset 3670016 – Waiting for timed out HB:
2019-08-27T00:06:17.041Z cpu28:108954)
[HB state abcdef02 offset 3670016 gen 3 stampUS 1696332028488 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 832400> drv 14.61 lockImpl
4]
2019-08-27T00:06:25.909Z cpu2:1715672)ScsiCore: 62: Starting taskmgmt handler
world 1715672/3
2019-08-27T00:06:27.907Z cpu17:1715673)ScsiCore: 62: Starting taskmgmt handler
world 1715673/4
2019-08-27T00:06:40.086Z cpu24:32995)HBX: 2851: ‘DS_Postpaid_Cluster_APPS_1’:
HB at offset 3670016 – Waiting for timed out HB:
2019-08-27T00:06:40.086Z cpu24:32995)
[HB state abcdef02 offset 3670016 gen 33 stampUS 1696332028433 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 1656800> drv 14.61 lockImpl
4]
- QLogic HBA throwing some warning which used to come when the HBA
goes Unresponsive:
2019-08-27T00:06:45.715Z cpu20:1714945)WARNING:
qlnativefc: vmhba2(3:0.1): Timeout: Max mbx wait reached.
2019-08-27T00:06:45.715Z cpu20:1714945)WARNING: qlnativefc: vmhba2(3:0.1):
Mailbox command 0x54 timeout occurred. Issuing ISP abort.
2019-08-27T00:06:45.715Z cpu20:1714945)WARNING: qlnativefc: vmhba2(3:0.1):
Firmware has been previously dumped (0x4308f69a5000) — ignoring request…
2019-08-27T00:06:45.715Z cpu20:1714945)qlnativefc: vmhba2(3:0.1): Inside
qlnativefcAbortIsp
- Here we can see a reset being initiated on the HBA:
|
[SCSI Code]: Failed: H:0x8 D:0x0 P:0x0 Possible
sense data: 0x0 0x0 0x0. |
ASC/ASCQ: |
00h / 00h |
ASC Details: |
NO ADDITIONAL SENSE INFORMATION |
Status: |
00h – GOOD Description: This status
code indicates that the device has completed the task successfully. |
Hostebyte: |
0x08 – DID_RESET – Reset by somebody. |
Sensekey: |
0h |
Sensekey MSG: |
NO SENSE |
Sensekey Notes: |
Indicates that there is no specific sense key
information to be reported. This may occur for a successful command or for a
command that receives CHECK CONDITION status because one of the FILEMARK.
EOM. or ILI bits is set to one. |
Plugin notes: |
VMK_SCSI_PLUGIN_GOOD – No error. |
2019-08-27T00:06:45.715Z cpu20:1714945)qlnativefc:
vmhba2(3:0.1): Performing ISP error recovery – ha= 0x4308fe061080.
2019-08-27T00:06:45.715Z cpu7:33494)NMP: nmp_ThrottleLogForDevice:3302: Cmd
0x2a (0x43b61a818800, 197862) to dev “naa.60000970000297500084533030391111”
on path “vmhba2:C0:T0:L20” Failed: H:0x8 D:0x0 P:0x0 Possible sense
data: 0x0 0x0 0x0. Act:EVAL
2019-08-27T00:06:45.715Z cpu7:33494)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device “naa.60000970000297500084533030391111”
state in doubt; requested fast path state update…
2019-08-27T00:06:45.715Z cpu7:33494)ScsiDeviceIO: 2595: Cmd(0x43b61a818800)
0x2a, CmdSN 0x800e002d from world 197862 to dev “naa.60000970000297500084533030391111”
failed H:0x8 D:0x0 P:0x0
2019-08-27T00:06:45.715Z cpu25:33495)NMP: nmp_ThrottleLogForDevice:3302: Cmd
0x28 (0x43be0f423a00, 1715649) to dev “naa.60000970000297500084533030422222”
on path “vmhba2:C0:T0:L2” Failed: H:0x8 D:0x0 P:0x0 Possible sense
data: 0x0 0x0 0x0. Act:EVAL
2019-08-27T00:06:45.715Z cpu25:33495)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device “naa.60000970000297500084533030422222”
state in doubt; requested fast path state update…
2019-08-27T00:06:45.715Z cpu7:33494)ScsiDeviceIO: 2595: Cmd(0x43b6198012c0)
0x2a, CmdSN 0x800e007c from world 197862 to dev “naa.60000970000297500084533030391111”
failed H:0x8 D:0x0 P:0x0
2019-08-27T00:06:45.715Z cpu25:33495)ScsiDeviceIO: 2613: Cmd(0x43be0f423a00)
0x28, CmdSN 0x15a5c from world 1715649 to dev “naa.60000970000297500084533030422222”
failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2019-08-27T00:06:45.715Z cpu7:33494)ScsiDeviceIO: 2595: Cmd(0x43b600fdc3c0)
0x2a, CmdSN 0x800e0079 from world 197862 to dev “naa.60000970000297500084533030391111”
failed H:0x8 D:0x0 P:0x0
2019-08-27T00:06:45.715Z cpu7:33494)NMP: nmp_ThrottleLogForDevice:3235: last
error status from device naa.60000970000297500084533030391111 repeated 10 times
2019-08-27T00:06:45.715Z cpu7:33494)ScsiDeviceIO: 2613: Cmd(0x43b61a81d180)
0x28, CmdSN 0x15a5b from world 1715654 to dev “naa.60000970000297500084533030422222”
failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2019-08-27T00:06:45.715Z cpu7:33494)NMP: nmp_ThrottleLogForDevice:3302: Cmd
0x2a (0x43b600ee4e40, 32806) to dev “naa.60000970000297500084533030393333”
on path “vmhba2:C0:T0:L15” Failed: H:0x8 D:0x0 P:0x0 Possible sense
data: 0x0 0x0 0x0. Act:EVAL
2019-08-27T00:06:45.715Z cpu7:33494)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device “naa.60000970000297500084533030393333”
state in doubt; requested fast path state update…
2019-08-27T00:06:45.724Z cpu14:33494)NMP: nmp_ThrottleLogForDevice:3235: last
error status from device naa.60000970000297500084533030422222 repeated 20 times
2019-08-27T00:06:45.732Z cpu25:33495)NMP: nmp_ThrottleLogForDevice:3235: last
error status from device naa.60000970000297500084533030422222 repeated 40 times
2019-08-27T00:06:45.746Z cpu14:33494)NMP: nmp_ThrottleLogForDevice:3235: last
error status from device naa.60000970000297500084533030422222 repeated 80 times
- Here we can see that the Timeout has finally reached where the
command started to failed:
2019-08-27T00:06:55.744Z cpu16:34146)Fil3: 15978:
Retry 10 for caller Fil3_FileIO (status ‘IO was aborted by VMFS via a
virt-reset on the device’)
2019-08-27T00:06:55.909Z cpu19:32996)HBX: 2851: ‘DS_Postpaid_Cluster_APPS_2’:
HB at offset 3670016 – Waiting for timed out HB:
2019-08-27T00:06:55.909Z cpu19:32996)
[HB state abcdef02 offset 3670016 gen 3 stampUS 1696332028488 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 832400> drv 14.61 lockImpl
4]
2019-08-27T00:07:03.105Z cpu18:937761)HBX: 2851: ‘DS_Postpaid_Cluster_APPS_1’:
HB at offset 3670016 – Waiting for timed out HB:
2019-08-27T00:07:03.105Z cpu18:937761)
[HB state abcdef02 offset 3670016 gen 33 stampUS 1696332028433 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 1656800> drv 14.61 lockImpl
4]
2019-08-27T00:07:07.043Z cpu29:108949)HBX: 2851: ‘DS_Postpaid_Cluster_APPS_1’:
HB at offset 3670016 – Waiting for timed out HB:
2019-08-27T00:07:07.043Z cpu29:108949)
[HB state abcdef02 offset 3670016 gen 33 stampUS 1696332028433 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 1656800> drv 14.61 lockImpl
4]
2019-08-27T00:07:07.043Z cpu28:108954)HBX: 2851: ‘DS_Postpaid_Cluster_APPS_2’:
HB at offset 3670016 – Waiting for timed out HB:
2019-08-27T00:07:07.043Z cpu28:108954)
[HB state abcdef02 offset 3670016 gen 3 stampUS 1696332028488 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 832400> drv 14.61 lockImpl
4]
2019-08-27T00:07:15.716Z cpu30:1714943)WARNING: qlnativefc: vmhba2(3:0.1):
Timeout: Max mbx wait reached.
2019-08-27T00:07:15.716Z cpu30:1714943)qlnativefc: vmhba2(3:0.1):
qlnativefcMailboxCommand(1): **** FAILED
mb[0]=54
mb[2]=aaa72019-08-27T00:07:15.716Z cpu30:1714943) cmd=54 ****
2019-08-27T00:07:15.716Z cpu30:1714943)WARNING: ScsiPath: 7174: Set retry
timeout for failed TaskMgmt abort for CmdSN
0x1a7fe6, status Failure, path vmhba2:C0:T0:L15
2019-08-27T00:07:15.716Z cpu9:35647)qlnativefc: vmhba2(3:0.1):
qlnativefcMailboxCommand(1): **** FAILED
mb[0]=4001 mb[1]=4953 mb[2]=5020
mb[3]=25322019-08-27T00:07:15.716Z cpu9:35647) cmd=69 ****
2019-08-27T00:07:15.761Z cpu16:34146)Fil3: 15969: Max retries (20) exceeded for
caller Fil3_FileIO (status ‘IO was aborted by VMFS via a virt-reset on the
device’)
2019-08-27T00:07:15.761Z cpu16:34146)BC: 2440: Failed to write (uncached)
object ‘.iormstats.sf’: Maximum kernel-level retries exceeded
2019-08-27T00:07:15.811Z cpu20:1714945)WARNING: qlnativefc: vmhba2(3:0.1):
Firmware dump previously allocated.
2019-08-27T00:07:15.847Z cpu20:1714945)qlnativefc: vmhba2(3:0.1): Enabling
PUREX.
2019-08-27T00:07:15.859Z cpu20:1714945)qlnativefc: vmhba2(3:0.1): DPORT feature
: disabled.
2019-08-27T00:07:15.859Z cpu20:1714945)qlnativefc: vmhba2(3:0.1): FAWWN feature
: disabled.
2019-08-27T00:07:16.433Z cpu20:1714945)qlnativefc: vmhba2(3:0.1): LOOP UP
detected (8 Gbps).
2019-08-27T00:07:16.817Z cpu20:1714945)qlnativefc: vmhba2(3:0.1): fcport
50000975900150c6 (targetId = 0) ONLINE
2019-08-27T00:07:16.829Z cpu20:1714945)qlnativefc: vmhba2(3:0.1):
qlnativefcMailboxCommand(1): **** FAILED
mb[0]=54
mb[2]=aaa72019-08-27T00:07:16.829Z cpu20:1714945) cmd=54 ****
2019-08-27T00:07:16.829Z cpu20:1714945)WARNING: ScsiPath: 7174: Set retry
timeout for failed TaskMgmt abort for CmdSN
0x285068, status Failure, path vmhba2:C0:T0:L20
2019-08-27T00:07:17.717Z cpu20:1714945)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device “naa.60000970000297500084533030393333”
state in doubt; requested fast path state update…
2019-08-27T00:07:17.717Z cpu20:1714945)ScsiDeviceIO: 2595: Cmd(0x43be0e5b0880)
0xfe, CmdSN 0x1a7fe6 from world 32806 to dev “naa.60000970000297500084533030393333”
failed H:0x8 D:0x0 P:0x0
2019-08-27T00:07:17.764Z cpu12:32859)HBX: 283: ‘DS_Postpaid_Cluster_APPS_1’: HB
at offset 3670016 – Reclaimed heartbeat [Timeout]:
2019-08-27T00:07:17.764Z cpu12:32859)
[HB state abcdef02 offset 3670016 gen 33 stampUS 1696402079932 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 1656800> drv 14.61 lockImpl
4]
2019-08-27T00:07:17.788Z cpu18:937761)HBX: 2851: ‘DS_Postpaid_Cluster_APPS_2’:
HB at offset 3670016 – Waiting for timed out HB:
2019-08-27T00:07:17.788Z cpu18:937761)
[HB state abcdef02 offset 3670016 gen 3 stampUS 1696332028488 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 832400> drv 14.61 lockImpl
4]
2019-08-27T00:07:18.831Z cpu20:1714945)WARNING: NMP:
nmp_DeviceRequestFastDeviceProbe:237: NMP device “naa.60000970000297500084533030391111”
state in doubt; requested fast path state update…
2019-08-27T00:07:18.831Z cpu20:1714945)ScsiDeviceIO: 2595: Cmd(0x43be00ea8980)
0xfe, CmdSN 0x285068 from world 32806 to dev “naa.60000970000297500084533030391111”
failed H:0x8 D:0x0 P:0x0
2019-08-27T00:07:18.870Z cpu15:32857)HBX: 283: ‘DS_Postpaid_Cluster_APPS_2’: HB
at offset 3670016 – Reclaimed heartbeat [Timeout]:
2019-08-27T00:07:18.870Z cpu15:32857)
[HB state abcdef02 offset 3670016 gen 3 stampUS 1696403185720 uuid
5d4a91f0-5d376611-e106-3863bb4a1158 jrnl <FB 832400> drv 14.61 lockImpl
4]
2019-08-27T00:07:18.878Z cpu13:33491)NMP: nmp_ThrottleLogForDevice:3252: last
error status from device naa.60000970000297500084533030391111 repeated 2 times
2019-08-27T00:07:18.878Z cpu13:33491)NMP: nmp_ThrottleLogForDevice:3302: Cmd
0xf1 (0x43b613c78e40, 1715712) to dev “naa.60000970000297500084533030391111”
on path “vmhba1:C0:T0:L20” Failed: H:0x0 D:0x2 P:0x0 Valid sense
data: 0xe 0x1d 0x0. Act:NONE
2019-08-27T00:07:18.878Z cpu13:33491)ScsiDeviceIO: 2613: Cmd(0x43b6190e93c0)
0xfe, CmdSN 0x28508b from world 1715712 to dev “naa.60000970000297500084533030391111”
failed H:0x0 D:0x2 P:0x5 Possible sense data: 0x0 0x0 0x0.
Hostd Logs:
- On the Hostd logs we can see that the Path went down when there is
a reset which is being initiated by the HBA, where we can see almost all
the datastore were affected:
2019-08-27T00:06:54.909Z info hostd[74F9FB70]
[Originator@6876 sub=Vimsvc.ha-eventmgr] Event 9713 : Path redundancy to
storage device naa.60000970000297500084533030422222 degraded. Path
vmhba2:C0:T0:L2 is down. Affected datastores: DS_Prepaid_Cluster6_APPS_1.
2019-08-27T00:06:54.910Z info hostd[74F9FB70] [Originator@6876
sub=Vimsvc.ha-eventmgr] Event 9714 : Path redundancy to storage device naa.60000970000297500084533030391111
degraded. Path vmhba2:C0:T0:L20 is down. Affected datastores:
DS_Postpaid_Cluster_APPS_2.
2019-08-27T00:06:54.910Z info hostd[74F9FB70] [Originator@6876
sub=Vimsvc.ha-eventmgr] Event 9715 : Path redundancy to storage device naa.60000970000297500084533030393333
degraded. Path vmhba2:C0:T0:L15 is down. Affected datastores:
DS_Postpaid_Cluster_APPS_1.
2019-08-27T00:06:54.911Z info hostd[746CBB70] [Originator@6876
sub=Vimsvc.ha-eventmgr] Event 9716 : Path redundancy to storage device
naa.60000970000297500084533030423842 degraded. Path vmhba2:C0:T0:L10 is down.
Affected datastores: DS_Prepaid_Cluster6_APPS_3.
2019-08-27T00:06:54.917Z info hostd[72EB1B70] [Originator@6876
sub=Vimsvc.ha-eventmgr] Event 9726 : Path redundancy to storage device
naa.60000970000297500084533030423832 degraded. Path vmhba2:C0:T0:L1 is down.
Affected datastores: DS_Prepaid_Cluster6
VOBD Logs:
- From the VOBD Logs we can see the same path state changing probably
came when the HBA Initiated a Reset:
2019-08-27T00:06:54.907Z: [scsiCorrelator]
1696379222751us: [vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T0:L2
changed state from on
2019-08-27T00:06:54.908Z: [scsiCorrelator] 1696369608926us:
[esx.problem.storage.redundancy.degraded] Path redundancy to storage device
naa.60000970000297500084533030422222 degraded. Path vmhba2:C0:T0:L2 is down.
Affected datastores: “DS_Prepaid_Cluster6_APPS_1”.
2019-08-27T00:06:54.908Z: [scsiCorrelator] 1696379222755us:
[vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T0:L20 changed state from
on
2019-08-27T00:06:54.909Z: [scsiCorrelator] 1696369609470us:
[esx.problem.storage.redundancy.degraded] Path redundancy to storage device
naa.60000970000297500084533030391111 degraded. Path vmhba2:C0:T0:L20 is down.
Affected datastores: “DS_Postpaid_Cluster_APPS_2”.
2019-08-27T00:06:54.909Z: [scsiCorrelator] 1696379222760us:
[vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T0:L15 changed state from
on
2019-08-27T00:06:54.909Z: [scsiCorrelator] 1696369609960us:
[esx.problem.storage.redundancy.degraded] Path redundancy to storage device
naa.60000970000297500084533030393333 degraded. Path vmhba2:C0:T0:L15 is down.
Affected datastores: “DS_Postpaid_Cluster_APPS_1”.
2019-08-27T00:06:54.909Z: [scsiCorrelator] 1696379223358us:
[vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T0:L10 changed state from
on
2019-08-27T00:06:54.910Z: [scsiCorrelator] 1696369610482us:
[esx.problem.storage.redundancy.degraded] Path redundancy to storage device
naa.60000970000297500084533030423842 degraded. Path vmhba2:C0:T0:L10 is down.
Affected datastores: “DS_Prepaid_Cluster6_APPS_3”.
2019-08-27T00:06:54.910Z: [scsiCorrelator] 1696379224049us:
[vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T0:L11 changed state from
on
2019-08-27T00:06:54.910Z: [scsiCorrelator] 1696369610980us:
[esx.problem.storage.redundancy.degraded] Path redundancy to storage device
naa.60000970000297500084533030423843 degraded. Path vmhba2:C0:T0:L11 is down.
Affected datastores: “DS_Prepaid_Cluster6_APPS_3”.
2019-08-27T00:06:54.924Z: [scsiCorrelator] 1696379239388us:
[vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T0:L22 changed state from
on
2019-08-27T00:06:54.924Z: [scsiCorrelator] 1696369624918us:
[esx.problem.storage.redundancy.degraded] Path redundancy to storage device
naa.60000970000297500084533030394143 degraded. Path vmhba2:C0:T0:L22 is down.
Affected datastores: “DS_Postpaid_Cluster_APPS_2”.
2019-08-27T00:06:54.924Z: [scsiCorrelator] 1696379240069us:
[vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T0:L23 changed state from
on
2019-08-27T00:06:54.925Z: [scsiCorrelator] 1696369625585us:
[esx.problem.storage.redundancy.degraded] Path redundancy to storage device
naa.60000970000297500084533030394144 degraded. Path vmhba2:C0:T0:L23 is down.
Affected datastores: “DS_Postpaid_Cluster_APPS_2”.
2019-08-27T00:06:54.925Z: [scsiCorrelator] 1696379240767us:
[vob.scsi.scsipath.pathstate.dead] scsiPath vmhba2:C0:T0:L24 changed state from
on
2019-08-27T00:06:54.926Z: [scsiCorrelator] 1696369626301us:
[esx.problem.storage.redundancy.degraded] Path redundancy to storage device
naa.60000970000297500084533030394145 degraded. Path vmhba2:C0:T0:L24 is down.
Affected datastores: “DS_Postpaid_Cluster_APPS_2”.
- After the reset operation got comepleted we can see that the Path
came online and everything went fine.
2019-08-27T00:07:16.939Z: [scsiCorrelator]
1696401255008us: [vob.scsi.scsipath.pathstate.on] scsiPath vmhba2:C0:T0:L10
changed state from dead
2019-08-27T00:07:16.940Z: [scsiCorrelator] 1696391640751us:
[esx.clear.storage.redundancy.restored] Path redundancy to storage device
naa.60000970000297500084533030423842 (Datastores:
“DS_Prepaid_Cluster6_APPS_3”) restored. Path vmhba2:C0:T0:L10 is
active again.
2019-08-27T00:07:16.942Z: [scsiCorrelator] 1696401257471us:
[vob.scsi.scsipath.pathstate.on] scsiPath vmhba2:C0:T0:L11 changed state from
dead
2019-08-27T00:07:16.942Z: [scsiCorrelator] 1696391642927us:
[esx.clear.storage.redundancy.restored] Path redundancy to storage device
naa.60000970000297500084533030423843 (Datastores:
“DS_Prepaid_Cluster6_APPS_3”) restored. Path vmhba2:C0:T0:L11 is
active again.
2019-08-27T00:07:16.944Z: [scsiCorrelator] 1696401259885us:
[vob.scsi.scsipath.pathstate.on] scsiPath vmhba2:C0:T0:L12 changed state from
dead
2019-08-27T00:07:16.945Z: [scsiCorrelator] 1696391645322us:
[esx.clear.storage.redundancy.restored] Path redundancy to storage device
naa.60000970000297500084533030423844 (Datastores:
“DS_Prepaid_Cluster6_APPS_3”) restored. Path vmhba2:C0:T0:L12 is
active again.
2019-08-27T00:07:16.947Z: [scsiCorrelator] 1696401262305us:
[vob.scsi.scsipath.pathstate.on] scsiPath vmhba2:C0:T0:L13 changed state from
dead
2019-08-27T00:07:16.947Z: [scsiCorrelator] 1696391647749us:
[esx.clear.storage.redundancy.restored] Path redundancy to storage device
naa.60000970000297500084533030423845 (Datastores: “DS_Prepaid_Cluster6_APPS_3”)
restored. Path vmhba2:C0:T0:L13 is active again.
2019-08-27T00:07:16.949Z: [scsiCorrelator] 1696401264786us:
[vob.scsi.scsipath.pathstate.on] scsiPath vmhba2:C0:T0:L14 changed state from
dead
2019-08-27T00:07:16.950Z: [scsiCorrelator] 1696391650270us:
[esx.clear.storage.redundancy.restored] Path redundancy to storage device
naa.60000970000297500084533030423846 (Datastores:
“DS_Prepaid_Cluster6_APPS_3”) restored. Path vmhba2:C0:T0:L14 is
active again.
2019-08-27T00:07:16.952Z: [scsiCorrelator] 1696401267478us:
[vob.scsi.scsipath.pathstate.on] scsiPath vmhba2:C0:T0:L15 changed state from
dead
2019-08-27T00:07:16.952Z: [scsiCorrelator] 1696391652879us:
[esx.clear.storage.redundancy.restored] Path redundancy to storage device naa.60000970000297500084533030393333
(Datastores: “DS_Postpaid_Cluster_APPS_1”) restored. Path
vmhba2:C0:T0:L15 is active again.
2019-08-27T00:07:16.954Z: [scsiCorrelator] 1696401269813us:
[vob.scsi.scsipath.pathstate.on] scsiPath vmhba2:C0:T0:L16 changed state from
dead
2019-08-27T00:07:16.955Z: [scsiCorrelator] 1696391655216us:
[esx.clear.storage.redundancy.restored] Path redundancy to storage device
naa.60000970000297500084533030394136 (Datastores:
“DS_Postpaid_Cluster_APPS_1”) restored. Path vmhba2:C0:T0:L16 is
active again.
2019-08-27T00:07:17.765Z: [vmfsCorrelator]
1696392465399us: [esx.problem.vmfs.heartbeat.recovered]
5d4cd6c0-2d6003b1-afa6-3863bb4a1158 DS_Postpaid_Cluster_APPS_1
2019-08-27T00:07:18.871Z: [vmfsCorrelator] 1696393571426us:
[esx.problem.vmfs.heartbeat.recovered] 5d4cd7af-327f5b20-6bbe-3863bb4a1158
DS_Postpaid_Cluster_APPS_2
Conclusion:
- As per the below Events:
2019-08-27T00:06:06.100Z cpu25:33495)IntrCookie:
3429: Interrupt received on invalid vector (cpu 25, vector 76); ignoring it.
2019-08-27T00:06:45.715Z cpu20:1714945)WARNING:
qlnativefc: vmhba2(3:0.1): Timeout: Max mbx wait reached.
2019-08-27T00:06:45.715Z cpu20:1714945)WARNING: qlnativefc: vmhba2(3:0.1):
Mailbox command 0x54 timeout occurred. Issuing ISP abort.
2019-08-27T00:06:45.715Z cpu20:1714945)WARNING: qlnativefc: vmhba2(3:0.1):
Firmware has been previously dumped (0x4308f69a5000) — ignoring request…
2019-08-27T00:06:45.715Z cpu20:1714945)qlnativefc: vmhba2(3:0.1): Inside
qlnativefcAbortIsp
- This is the reason why we have noticed the issue in the
environment.
- As per my Research this is a Known issue with QLogic 8Gb FC HBA and HPE Gen8 platform
- As per the Article: https://kb.vmware.com/s/article/2149592
The vmkernel alert shown indicates that ESXi has
received an interrupt on an unexpected processor interrupt vector and has
ignored the interrupt because ESXi does not know the source of the interrupt or
which software handler to invoke.
Reception of such an interrupt is not normal and
may cause loss of I/O connectivity and/or ESXi PSODs.
In order to distribute the interrupt load evenly
across processors, ESXi performs interrupt re-balancing. As part of this
process, ESXi may modify the target processor and interrupt vector assigned to
an I/O device. This process is known as I/O device interrupt migration.
When the VT-d interrupt remapper is not present
in a platform or its use is disabled in ESXi (boot
option iovDisableIR=TRUE), ESXi performs I/O device interrupt migration by
modifying the PCI MSI/MSI-X registers in the I/O device(s).
VMware has recently received reports that some
I/O devices do not react well when ESXi performs interrupt migration by
programming the device’s MSI/MSI-X registers. Specifically, some I/O devices
may interrupt in a processor vector which ESXi no longer uses and considers
invalid, causing the generation of the alert message and its subsequent
effects. This issue is only present on 8G FC connection.
- To Understand more about this issue I will recommend that you check
with HP if there is any fix released for the issue mentioned above because
as per Vmware Engineering team this is a Third Party issue and They will
have to find a fix while iovDisableIR is enabled.
Workaround:
For systems that support, enable Intel VT-d Interrupt Remapper.
See the VMware KB article, ESXi IO connectivity issues or PSOD with VT-d interrupt remapper disabled (2149592) .
- Connect to the ESXi host with an SSH session and root credentials.
- Run this command:
esxcli system settings kernel set –setting=iovDisableIR -v FALSE
- Reboot the ESXi host.
- Ensure the iovDisableIR setting is set to FALSE with
this command:
esxcli system settings kernel list -o iovDisableIR