Hostname: xyzsv010321.intranet.abc.com
| uptime: 7.9 days | uptime: 11363 minutes
ESXi Version: VMware ESXi 6.7.0 build-17700523
Object Health:
—————————————————————————-
nonavailabilityrelatedincompliancewithpolicypendingfailed 0
reduced-availability-with-no-rebuild-delay-timer 0
reducedavailabilitywithpolicypending 0
inaccessible
0
reduced-availability-with-active-rebuild 0
nonavailability-related-incompliance 0
reducedavailabilitywithpausedrebuild 0
nonavailabilityrelatedincompliancewithpausedrebuild 0
nonavailability-related-reconfig
0
reduced-availability-with-no-rebuild 0
nonavailabilityrelatedincompliancewithpolicypending 0
reducedavailabilitywithpolicypendingfailed 0
data-move
0
healthy
1674
VOBD Logs:
- From the VOBD logs we can see that due to IO Failure there is a
Repair operation which is being triggered and the vSAN Device has gone
offline.
2022-08-16T13:17:54.827Z: [vSANCorrelator] 399494474us:
[vob.vsan.lsom.devicerepair] vSAN device 52385363-7081-d832-fa4f-88200d001cb5
is being repaired due to I/O failures, and will be out of service until the
repair is complete. If the device is part of a dedup disk group, the entire
disk group will be out of service until the repair is complete.
2022-08-16T13:17:54.827Z: [vSANCorrelator] 399494779us:
[esx.problem.vob.vsan.lsom.devicerepair] Device
52385363-7081-d832-fa4f-88200d001cb5 is in offline state and is getting
repaired.
2022-08-16T13:17:58.844Z: [vSANCorrelator] 403511359us:
[vob.vsan.pdl.offline] vSAN device 528dd82a-9af8-a4ff-2982-653e28d011ce has
gone offline.
2022-08-16T13:17:58.844Z: [vSANCorrelator] 403511548us:
[esx.problem.vob.vsan.pdl.offline] vSAN device
528dd82a-9af8-a4ff-2982-653e28d011ce has gone offline.
2022-08-16T13:17:58.844Z: An event (esx.problem.vob.vsan.pdl.offline)
could not be sent immediately to hostd; queueing for retry.
2022-08-16T13:17:58.844Z: [vSANCorrelator] 403511367us:
[vob.vsan.pdl.offline] vSAN device 52824081-ba45-2e25-e41c-03338f894606 has
gone offline.
2022-08-16T13:17:58.844Z: [vSANCorrelator] 403511609us:
[esx.problem.vob.vsan.pdl.offline] vSAN device
52824081-ba45-2e25-e41c-03338f894606 has gone offline.
2022-08-16T13:17:58.844Z: An event (esx.problem.vob.vsan.pdl.offline)
could not be sent immediately to hostd; queueing for retry.
2022-08-16T13:17:58.844Z: [vSANCorrelator] 403511392us:
[vob.vsan.pdl.offline] vSAN device 52d7cbcd-30f3-b646-7174-32e28da7dbbb has
gone offline.
2022-08-16T13:18:01.786Z: [vSANCorrelator] 406453743us:
[vob.vsan.net.reconfigured] vmknic vmk2 has been reconfigured.
2022-08-16T13:18:01.786Z: [vSANCorrelator] 406453984us:
[esx.audit.vsan.net.vnic.added] vSAN vnic added
2022-08-16T13:18:01.787Z: An event
(esx.audit.vsan.net.vnic.added) could not be sent immediately to hostd;
queueing for retry.
2022-08-16T13:18:40.574Z: [GenericCorrelator] 444823152us:
[vob.user.host.boot] Host has booted.
2022-08-16T13:18:40.574Z: [UserLevelCorrelator]
444823152us: [vob.user.host.boot] Host has booted.
2022-08-16T13:18:40.574Z: [UserLevelCorrelator] 444823659us:
[esx.audit.host.boot] Host has booted.
2022-08-16T13:23:41.329Z: [vSANCorrelator] 745548641us:
[vob.vsan.lsom.devicerepair] vSAN device 52385363-7081-d832-fa4f-88200d001cb5
is being repaired due to I/O failures, and will be out of service until the
repair is complete. If the device is part of a dedup disk group, the entire
disk group will be out of service until the repair is complete.
2022-08-16T13:23:41.329Z: [vSANCorrelator] 745577903us:
[esx.problem.vob.vsan.lsom.devicerepair] Device
52385363-7081-d832-fa4f-88200d001cb5 is in offline state and is getting
repaired.
2022-08-16T13:39:12.864Z: [vSANCorrelator] 1677004102us:
[vob.vsan.lsom.devicerepair] vSAN device 52385363-7081-d832-fa4f-88200d001cb5
is being repaired due to I/O failures, and will be out of service until the
repair is complete. If the device is part of a dedup disk group, the entire
disk group will be out of service until the repair is complete.
- Here we can see that the device error:
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606646322us:
[vob.vsan.lsom.diskerror] vSAN device 52d7cbcd-30f3-b646-7174-32e28da7dbbb is
under permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606835399us:
[esx.problem.vob.vsan.lsom.diskerror] vSAN device
52d7cbcd-30f3-b646-7174-32e28da7dbbb is under permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606646331us:
[vob.vsan.lsom.diskerror] vSAN device 52d7cbcd-30f3-b646-7174-32e28da7dbbb is
under permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606835490us:
[esx.problem.vob.vsan.lsom.diskerror] vSAN device
52d7cbcd-30f3-b646-7174-32e28da7dbbb is under permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606646388us:
[vob.vsan.lsom.diskpropagatedpermerror] vSAN device
52824081-ba45-2e25-e41c-03338f894606 is under propagated permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606835535us:
[esx.problem.vob.vsan.lsom.diskpropagatedpermerror] vSAN device
52824081-ba45-2e25-e41c-03338f894606 is under propagated permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606646401us:
[vob.vsan.lsom.diskpropagatedpermerror] vSAN device
528dd82a-9af8-a4ff-2982-653e28d011ce is under propagated permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606835574us:
[esx.problem.vob.vsan.lsom.diskpropagatedpermerror] vSAN device
528dd82a-9af8-a4ff-2982-653e28d011ce is under propagated permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606646415us:
[vob.vsan.lsom.diskpropagatedpermerror] vSAN device
52385363-7081-d832-fa4f-88200d001cb5 is under propagated permanent error.
2022-08-16T13:54:42.586Z: [vSANCorrelator] 2606835612us:
[esx.problem.vob.vsan.lsom.diskpropagatedpermerror] vSAN device
52385363-7081-d832-fa4f-88200d001cb5 is under propagated permanent error.
Vmkernel Logs:
2022-08-16T13:23:37.311Z cpu6:2098447)NMP:
nmp_ThrottleLogForDevice:3872: Cmd 0x28 (0x459bc1c9abc0, 0) to dev
“naa.5002538bc9916d70” on path “vmhba7:C0:T8:L0” Failed:
H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0. Act:NONE
2022-08-16T13:23:37.311Z cpu6:2098447)ScsiDeviceIO: 3483:
Cmd(0x459bc1c9abc0) 0x28, CmdSN 0x2875 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:23:37.812Z cpu2:2098447)ScsiDeviceIO: 3483:
Cmd(0x459bc1d3f800) 0x28, CmdSN 0x357e from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:23:38.314Z cpu20:2098447)ScsiDeviceIO: 3483:
Cmd(0x459bc1d3e180) 0x28, CmdSN 0x40a4 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:23:38.816Z cpu3:2098447)ScsiDeviceIO: 3483:
Cmd(0x459f3498fa80) 0x28, CmdSN 0x44c9 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:23:39.317Z cpu3:2098447)ScsiDeviceIO: 3483:
Cmd(0x459f34928100) 0x28, CmdSN 0x475b from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:23:39.819Z cpu3:2098447)ScsiDeviceIO: 3483:
Cmd(0x459f348f4440) 0x28, CmdSN 0x4769 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:23:40.320Z cpu3:2098447)ScsiDeviceIO: 3483:
Cmd(0x45a222252c00) 0x28, CmdSN 0x476a from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:23:40.822Z cpu3:2098447)ScsiDeviceIO: 3483:
Cmd(0x45a2223c5c40) 0x28, CmdSN 0x476b from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:23:41.324Z cpu3:2098447)ScsiDeviceIO: 3483:
Cmd(0x459bc1c20640) 0x28, CmdSN 0x476c from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:39:08.848Z cpu43:2098448)NMP:
nmp_ThrottleLogForDevice:3872: Cmd 0x28 (0x45bbc17859c0, 0) to dev
“naa.5002538bc9916d70” on path “vmhba7:C0:T8:L0” Failed:
H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0. Act:NONE
2022-08-16T13:39:08.848Z cpu43:2098448)ScsiDeviceIO: 3483:
Cmd(0x45bbc17859c0) 0x28, CmdSN 0x27b7 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:39:09.349Z cpu36:2098447)ScsiDeviceIO: 3483:
Cmd(0x45a3d58cb500) 0x28, CmdSN 0x3418 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:39:09.850Z cpu36:2098447)ScsiDeviceIO: 3483:
Cmd(0x45a3d596d440) 0x28, CmdSN 0x3f60 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:39:10.351Z cpu5:2098447)ScsiDeviceIO: 3483:
Cmd(0x45a3d59adf40) 0x28, CmdSN 0x44b3 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:39:10.851Z cpu5:2098447)ScsiDeviceIO: 3483:
Cmd(0x45a3d59f4f80) 0x28, CmdSN 0x474e from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
2022-08-16T13:39:11.354Z cpu36:2098447)ScsiDeviceIO: 3483:
Cmd(0x45a3d58cad80) 0x28, CmdSN 0x4769 from world 0 to dev
“naa.5002538bc9916d70” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3
0x11 0x0.
- Under VMKernel logs we can see that the NAA.5002538bc9916d70
is getting 0x3 SCSI Code which as per the table below. As per this Read
commands are getting failed with the Medium Error. This gives us an idea
that the Device is having Issues.
Type |
Code |
Name |
Description |
Host Status |
[0x0] |
OK |
This status is returned when there
is no error on the host side. This is when you will see if there is a status
for a Device or Plugin. It is also when you will see Valid sense data instead
of Possible sense Data. |
Device Status |
[0x2] |
CHECK_CONDITION |
This status is returned when a
command fails for a specific reason. When a CHECK CONDITION is received, the
ESX storage stack will send out a SCSI command 0x3 (REQUEST SENSE) in order
to get the SCSI sense data (Sense Key, Additional Sense Code, ASC Qualifier,
and other bits). The sense data is listed after Valid sense data in the order
of Sense Key, Additional Sense Code, and ASC Qualifier. |
Plugin Status |
[0x0] |
GOOD |
No error. (ESXi 5.x / 6.x only) |
Sense Key |
[0x3] |
MEDIUM ERROR |
|
Additional Sense Data |
11/00 |
UNRECOVERED READ ERROR |
|
OP Code |
0x28 |
READ(10) |
|
Conclusion:
- Based on the logs we can see that there is a 0x3 SCSI Code generated
for the NAA.5002538bc9916d70
. This is an IO
Device Failure due to Medium Error.
Action Plan:
- Please replace the Disk with NAA.ID: 5002538bc9916d70. Below is the
information which you can share with the Dell Team so that they can
isolate this device:
naa.5002538bc9916d70:
Device: naa.5002538bc9916d70
Display Name: naa.5002538bc9916d70
Is SSD: true
VSAN UUID:
528dd82a-9af8-a4ff-2982-653e28d011ce
VSAN Disk Group UUID:
52385363-7081-d832-fa4f-88200d001cb5
VSAN Disk Group Name:
t10.NVMe____Dell_Express_Flash_NVMe_P4610_1.6TB_SFF_00016FEA25E4D25C
Used by this host: true
In CMMDS: true
On-disk format version: 10
Deduplication: true
Compression: true
Checksum: 1200740824495245392
Checksum OK: true
Is Capacity Tier: true
Encryption Metadata Checksum OK: true
Encryption: false
DiskKeyLoaded: false
Is Mounted: true
Creation Time: Fri Jul 15 12:32:04
2022
naa.5002538bc9916d70:
Device Display Name: Local SAMSUNG
Disk (naa.5002538bc9916d70)
Storage Array Type: VMW_SATP_LOCAL
Storage Array Type Device Config: SATP
VMW_SATP_LOCAL does not support device configuration.
Path Selection Policy: VMW_PSP_FIXED
Path Selection Policy Device Config:
{preferred=vmhba7:C0:T8:L0;current=vmhba7:C0:T8:L0}
Path Selection Policy Device Custom
Config:
Working Paths: vmhba7:C0:T8:L0
Is USB: false
- Please follow the KB: https://kb.vmware.com/s/article/2149067 in order to replace the storage. Please note that since the Deduplication is enabled, the Entire Disk Group will be in failed state.
- Disk Group must be destroyed first with No Data migration option (as the Disk Group is effectively lost), then replace the failed disk and re-create the Disk Group.