RCA-7: Storage Latency issue on vCenter Infra


Host Name: esx-icpmcc00305

 

Datastore Name                   

 naa / t10 / eui ID                   

Volume UUID                         

LUN ID 

Medium 

Number Path  

Vendor

ICP_VSI_****_009_SI_125b_0d77_N_G 

eui.65a29ccb3448125bfb4d0d770*****04 

598ab822-989f5084-04ca-246e961e97f0 

LUN:4  

fc     

1            

EMC Fibre Channel

 

 

 

VOBD:

 

    • From the VOBD Logs we can see that the issue started with Power-On Reset started happening on EUI.65a29ccb3448125bfb4d0d770*****04.

 

2019-02-13T02:16:38.187Z: [scsiCorrelator] 11662981439480us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.65a29ccb3448125bfb4d0d770*****04

2019-02-13T02:18:24.372Z: [scsiCorrelator] 11663087623696us: [vob.scsi.scsipath.por] Power-on Reset occurred on eui.65a29ccb3448125bfb4d0d770*****04
2019-02-13T02:18:24.373Z: [scsiCorrelator] 11663127568230us: [esx.problem.storage.connectivity.devicepor] Frequent PowerOn Reset Unit Attentions are occurring on device eui.65a29ccb3448125bfb4d0d770*****04. This may indicate a storage problem. Affected datastores: “ICP_VSI_****_009_SI_125b_0d77_N_G”.

 

 

VMKernel:

 

    • At the time Mentioned we are only able to see 1 Event

 

2019-02-13T12:36:00.270Z cpu21:33716)ScsiDeviceIO: 2354: Cmd(0x412eccb81340) 0x85, CmdSN 0x4c2c from world 34501 to dev “naa.61866da06e702c00210c7ebc*****c59” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    • From the VMKernel logs we can see the issue with the IO-Error after which a power-on reset happened:

 

2019-02-13T02:16:38.187Z cpu36:33502)WARNING: [11662981441] IO-ERROR comb: 295f00020015. offsetInComb 352. SizeInLB 1. SDS_ID 116891ed00000002. Comb Gen 29. Head Gen 9e87.
2019-02-13T02:16:38.187Z cpu36:33502)WARNING: Vol ID 0xfb4d0d7700000004. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status IO_FAULT_DEVICE_ERROR(362) Reason (old SCSI gen without retry) Retry count (0) chan (2)
2019-02-13T02:16:38.187Z cpu36:33502)scini: blkScsi_PrintIOInfo:3274: ScaleIO R2_01:hCmd 0x412ecd626440, OpCode 0x89, rc 51 scsiStat 2, senseCode 6, asc 41, ascq 2

 

2019-02-13T02:16:38.187Z cpu36:33502)NMP: nmp_ThrottleLogForDevice:2457: Cmd 0x89 (0x412ecd626440, 32813) to dev “eui.65a29ccb3448125bfb4d0d770*****04” on path “vmhba32:C0:T4:L4” Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x2. Act:NONE

 

2019-02-13T02:16:38.187Z cpu36:33502)ScsiDeviceIO: 2331: Cmd(0x412ecd626440) 0x89, CmdSN 0x8c5a3b from world 32813 to dev “eui.65a29ccb3448125bfb4d0d770*****04” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x2.

 

2019-02-13T02:16:38.187Z cpu36:33502)ScsiCore: 1486: Power-on Reset occurred on eui.65a29ccb3448125bfb4d0d770*****04

 

 

    • Unable to find any events on Host name : esx-icpmcc00314 and esx-icpmccd00648 at the time mentioned.

 

Conclusion:

 

    • At the time mentioned in the last email which is 12:20 am GMT we were not able to find any events associated to Esxi.
    • However if we see the events around 2:00 AM we can say that SAN might become heavily congested, which can cause I/O requests to take a long time to complete. The virtual machine storage performance is degraded.

 

Recommendation: 

 

Because the messages are caused by activity on the array outside of the ESX host’s scope, the storage administrator should use storage-specific tools to isolate the cause of the problem.

For More information please refer: https://kb.vmware.com/kb/1020702

 

 

 

 

 

 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply