RCA – 21 – VM Unresponsive while taking Production Checkpoints

Issue Description:

 

Virtual Machines not Responding on Cluster Name: GP-PROD-CLUSTER Running a copy of Microsoft Windows Server 2016 Datacenter Version 10.0.14393 Build 14393 when we are creating a Production checkpoints.

 

Issue reproduced on 13th

_____________________________________________________________________________

 

System Information: ARES

 

OS Name        Microsoft Windows Server 2016 Datacenter

Version        10.0.14393 Build 14393

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ARES

System Manufacturer        Dell Inc.

System Model        PowerEdge R630

System Type        x64-based PC

System SKU        SKU=NotProvided;ModelName=PowerEdge R630

Processor        Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 2601 Mhz, 14 Core(s), 28 Logical Processor(s)

Processor        Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 2601 Mhz, 14 Core(s), 28 Logical Processor(s)

BIOS Version/Date        Dell Inc. 2.4.3, 17/01/2017

 

System Events:

 

  • Got the following error around 12:00 AM as the VM was not able to register itself as it was not able to find the VM config File.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

9/13/2017

12:20:29 AM

Error

ARES.ABC

21502

Microsoft-Windows-Hyper-V-High-Availability

‘Virtual Machine Configuration GPDYNAMICS-TEST’ failed to register the virtual machine with the virtual machine management service. The Virtual Machine Management Service failed to register the configuration for the virtual machine ‘103A58D4-8E02-4FD3-B121-DC56D0551082’ at ‘C:\ClusterStorage\DS-PROD09\gpdynamics-test\GPDYNAMICS-TEST’: The system cannot find the path specified. (0x80070003). If the virtual machine is managed by a failover cluster, ensure that the file is located at a path that is accessible to other nodes of the cluster.

9/13/2017

12:20:29 AM

Error

ARES.ABC

1069

Microsoft-Windows-FailoverClustering

Cluster resource ‘Virtual Machine Configuration GPDYNAMICS-TEST’ of type ‘Virtual Machine Configuration’ in clustered role ‘GPDYNAMICS-TEST’ failed. The error code was ‘0x3’ (‘The system cannot find the path specified.’). Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

 

Application Events:

 

 

  • Went through the logs but was not able to find anything specific.

 

List of outdated drivers:

 

Time/Date String

Product Version

File Version

Company Name

File Description

5/25/2016 8:01

(7.13:65.105)

(7.13:65.105)

QLogic Corporation

QLogic 10 GigE VBD

3/4/2016 21:22

(10.0:11105.1001)

(6.603:6.0)

Avago Technologies

MEGASAS RAID Controller Driver for Windows

5/16/2016 2:28

(7.13:57.103)

(7.13:57.103)

QLogic Corporation

AMD64 BXND NDIS6.0 Driver

3/4/2016 21:46

(6.3:9600.16384)

(12.15:22.6)

Intel Corporation

Intel(R) Gigabit Adapter NDIS 6.x driver

4/5/2017 23:45

(9.5:0.1015)

(9.5:0.1015)

Veeam Software AG

CTK file system minifilter

 

 

_______________________________________________________________________

 

 

System Information: MORPHEUS

 

OS Name        Microsoft Windows Server 2016 Datacenter

Version        10.0.14393 Build 14393

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        MORPHEUS

System Manufacturer        Dell Inc.

System Model        PowerEdge R630

System Type        x64-based PC

System SKU        SKU=NotProvided;ModelName=PowerEdge R630

Processor        Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 2601 Mhz, 14 Core(s), 28 Logical Processor(s)

Processor        Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 2601 Mhz, 14 Core(s), 28 Logical Processor(s)

BIOS Version/Date        Dell Inc. 2.4.3, 17/01/2017

 

 

System Events:

 

  • Reviewed the logs and found that the CSV went to paused state, after which we saw the events related to ISCSI which clearly says that the Connection to the Target it lost and the initiator was not able to send the ISCSI PDU.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

9/11/2017

8:50:40 AM

Warning

MORPHEUS.ABC

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘DS-PROD09’ (‘DS-PROD09’) has entered a paused state because of ‘STATUS_CONNECTION_DISCONNECTED(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

9/11/2017

8:51:12 AM

Warning

MORPHEUS.ABC

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘DS-PROD09’ (‘DS-PROD09’) has entered a paused state because of ‘STATUS_CONNECTION_DISCONNECTED(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

9/11/2017

8:51:37 AM

Error

MORPHEUS.ABC

20

iScsiPrt

Connection to the target was lost. The initiator will attempt to retry the connection.

9/11/2017

8:51:37 AM

Error

MORPHEUS.ABC

7

iScsiPrt

The initiator could not send an iSCSI PDU. Error status is given in the dump data.

 

  • Checked and found that the Virtual Machine failed to start as it was not able to reserve resources with the Error: Insufficient system resources exist to complete the requested service.

 

9/11/2017

11:58:27 AM

Error

MORPHEUS.ABC

21502

Microsoft-Windows-Hyper-V-High-Availability

‘Virtual Machine GPDYNAMICS-TEST’ failed to start. ‘GPDYNAMICS-TEST’ failed to start. (Virtual machine ID 103A58D4-8E02-4FD3-B121-DC56D0551082) ‘GPDYNAMICS-TEST’ Synthetic Ethernet Port: Failed to finish reserving resources with Error ‘Insufficient system resources exist to complete the requested service.’ (0x800705AA). (Virtual machine ID 103A58D4-8E02-4FD3-B121-DC56D0551082) ‘GPDYNAMICS-TEST’ failed to allocate resources while connecting to a virtual network: Insufficient system resources exist to complete the requested service. (0x800705AA) (Virtual Machine ID 103A58D4-8E02-4FD3-B121-DC56D0551082). The Ethernet switch may not exist. Could not find Ethernet switch ‘OffNetwork’.

9/11/2017

11:58:27 AM

Error

MORPHEUS.ABC

1069

Microsoft-Windows-FailoverClustering

Cluster resource ‘Virtual Machine GPDYNAMICS-TEST’ of type ‘Virtual Machine’ in clustered role ‘GPDYNAMICS-TEST’ failed. The error code was ‘0x5aa’ (‘Insufficient system resources exist to complete the requested service.’). Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

9/11/2017

11:58:40 AM

Error

MORPHEUS.ABC

1205

Microsoft-Windows-FailoverClustering

The Cluster service failed to bring clustered role ‘GPDYNAMICS-TEST’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

 

  • Vmqueue failed to work and gave the following Error:

 

9/11/2017

3:56:44 PM

Error

MORPHEUS.ABC

113

Microsoft-Windows-Hyper-V-VmSwitch

Failed to allocate VMQ for NIC 43EABD46-7FEE-4444-AA88-9CC0D6B70A96–41B76058-E46B-48CD-A2E0-6A96370D7820 (Friendly Name: Network Adapter) on switch 67ABF766-9CAA-49D4-8F5E-09B4C8F5B3CB (Friendly Name: vSwitch1). Reason – Unknown. Status = {Operation Failed} The requested operation was unsuccessful.

 

 

 

Application Events:

 

  • Went through the logs but was not able to find anything specific.

 

List of outdated drivers:

 

 

Time/Date String

Product Version

File Version

Company Name

File Description

5/25/2016 8:01

(7.13:65.105)

(7.13:65.105)

QLogic Corporation

QLogic 10 GigE VBD

3/4/2016 21:22

(10.0:11105.1001)

(6.603:6.0)

Avago Technologies

MEGASAS RAID Controller Driver for Windows

5/16/2016 2:28

(7.13:57.103)

(7.13:57.103)

QLogic Corporation

AMD64 BXND NDIS6.0 Driver

3/4/2016 21:46

(6.3:9600.16384)

(12.15:22.6)

Intel Corporation

Intel(R) Gigabit Adapter NDIS 6.x driver

 

__________________________________________________________________

 

 

 

 

Conclusion:

 

After analyzing the logs we can see that the issue is happening due to Networking components being over utilized due to which we are getting the error ‘Insufficient system resources exist to complete the requested service.’ and the virtual machine is failing to complete the task.

 

  1. Kindly update the BIOS from the Server using the following link if the server is outdated: https://downloads.dell.com/FOLDER04490198M/1/BIOS_Y4Y95_WN64_2.5.5.EXE

 

  1. Update the Network Adaptor Firmware drivers to the latest and add more Physical Nics if possible so that the Network is getting overwhelmed.

 

  1. Please get in touch with the Hardware vendor as per the Windows Server Catalog Compellent Storage Center 7.1 is not supported for 2016. Incase if there is any recent update available please install the update:

https://www.windowsservercatalog.com/item.aspx?idItem=14ae29e7-f59a-c2cf-5997-4427418ca750&bCatID=1282

 

 

  1. Update the HBA 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply