RCA – 16 – CSV Went into Paused State

Issue Description:

 

We have four node cluster “ORL-HVCLUSTER-PR01” running a copy of “Microsoft Windows Server 2012 Standard” on which we got “Event ID: 5120 and 5142” “Cluster Shared Volume ‘USHQ-FS-02-G’ (‘USHQ-FS-02-G’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.” on 27th Jan 2017 which cause the VM running on CSV to crash.

 

_________________________________________________________________________________________

 

System Information: ABC-220-XY-02

 

OS Name        Microsoft Windows Server 2012 R2 Datacenter

Version        6.3.9600 Build 9600

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ABC-220-XY-02

System Manufacturer        Cisco Systems Inc

System Model        UCSB-B200-M3

System Type        x64-based PC

System SKU       

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

BIOS Version/Date        Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013

 

 

Application Events:

 

  • Went through the Application events and found the following events related to VSS services shutting down.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

1/27/2017

6:52:07 PM

Information

ABC-220-XY-02.abc.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

1/27/2017

6:54:21 PM

Warning

ABC-220-XY-02.abc.org

1

Navisphere Agent

N/A

 

 

System Events:

 

  • Verified the System events around 8:30 PM and found issues related to Network as MSDSM is showing events related to the DSM Driver Operation. However we have not seen any events related to the CSV going in Paused State on this Node.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

1/27/2017

8:29:27 PM

Information

ABC-220-XY-02.abc.org

37

mpio

Microsoft DSM is attempting an operation on \Device\MPIODisk2. The Type is noted in the dump data.

1/27/2017

8:29:50 PM

Warning

ABC-220-XY-02.abc.org

27

Microsoft-Windows-Hyper-V-VmSwitch

Networking driver in AB-NETSCALER-02 loaded but has a different version from the server. Server version 5.0  Client version 3.2 (Virtual machine ID C196386A-93CB-4085-AD4F-E90A2A89C301). The device will work, but this is an unsupported configuration. This means that technical support will not be provided until this problem is resolved. To fix this problem, upgrade the integration services. To upgrade, connect to the virtual machine and select Insert Integration Services Setup Disk from the Action menu.

1/27/2017

8:30:24 PM

Information

ABC-220-XY-02.abc.org

7036

Service Control Manager

The Volume Shadow Copy service entered the running state.

1/27/2017

8:30:24 PM

Information

ABC-220-XY-02.abc.org

7036

Service Control Manager

The Microsoft Software Shadow Copy Provider service entered the running state.

1/27/2017

8:30:26 PM

Information

ABC-220-XY-02.abc.org

45

mpio

Microsoft DSM attempted an operation on \Device\MPIODisk8.  This has occurred 1 times in the past 60 seconds. The Type is noted in the dump data.

 

 

  • After verifying the Events in the Cluster Events we found that the issue was started around: 4:35:56 PM where Network resources started going in offline State. And DCOM went down for the Application Backup Exec Application. Which gives the confirmation that there was a Backup Job Running in the Background.

 

1/27/2017

4:27:27 PM

Error

ABC-220-XY-02.abc.org

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer ABC-220-XY-01 using any of the configured protocols; requested by PID      c08 (C:\PROGRAM FILES\SYMANTEC\BACKUP EXEC\RAWS\beremote.exe).

1/27/2017

4:28:03 PM

Information

ABC-220-XY-02.abc.org

25

Microsoft-Windows-Hyper-V-VmSwitch

The MAC address 00-15-5D-00-65-42 has moved from port 99F07BF3-34BA-4F9F-9373-09ABF1910BC2 (Friendly Name: ) to port 503A7DEB-07C8-40D3-A1ED-48B0A2DF53B1 (Friendly Name: 47352527-915f-4c20-afc4-4bb9b6f2bb43).

1/27/2017

4:28:05 PM

Error

ABC-220-XY-02.abc.org

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer ABC-220-XY-01 using any of the configured protocols; requested by PID      c08 (C:\PROGRAM FILES\SYMANTEC\BACKUP EXEC\RAWS\beremote.exe).

1/27/2017

4:28:05 PM

Information

ABC-220-XY-02.abc.org

98

Microsoft-Windows-Ntfs

Volume Volume2 (\Device\HarddiskVolume9) is healthy.  No action is needed.

1/27/2017

4:28:48 PM

Error

ABC-220-XY-02.abc.org

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer ABC-220-XY-01 using any of the configured protocols; requested by PID      c08 (C:\PROGRAM FILES\SYMANTEC\BACKUP EXEC\RAWS\beremote.exe).

1/27/2017

4:29:26 PM

Error

ABC-220-XY-02.abc.org

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer ABC-220-XY-01 using any of the configured protocols; requested by PID      c08 (C:\PROGRAM FILES\SYMANTEC\BACKUP EXEC\RAWS\beremote.exe).

 

 

 

Cluster Events:

 

  • Cluster events shows the behavior of Resources failed on the Cluster Network and after which Resources started to Go Offline.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

1/27/2017

4:35:56 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘Cluster IP Address’ in clustered role ‘Cluster Group’ has transitioned from state WaitingToGoOffline to state OfflineCallIssued.

1/27/2017

4:35:56 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘Cluster IP Address’ in clustered role ‘Cluster Group’ has transitioned from state OfflineCallIssued to state OfflineSavingCheckpoints.

1/27/2017

4:35:56 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘Volume1′ in clustered role ’73e1bc32-e654-4319-a9c9-3d65bf5faa8b’ has transitioned from state OfflineCallIssued to state OfflinePending.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘Quorum’ in clustered role ‘Cluster Group’ has transitioned from state Online to state WaitingToGoOffline. Cluster resource ‘Quorum’ is waiting on the following resources: .

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘Quorum’ in clustered role ‘Cluster Group’ has transitioned from state WaitingToGoOffline to state OfflineCallIssued.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘Volume1′ in clustered role ’73e1bc32-e654-4319-a9c9-3d65bf5faa8b’ has transitioned from state OfflinePending to state OfflineSavingCheckpoints.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘USHQ-FS-02-I’ in clustered role ‘1b5eb61e-cc5f-4052-926f-b56a9e2ab886’ has transitioned from state OfflinePending to state OfflineSavingCheckpoints.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘USHQ-FS-02-I’ in clustered role ‘1b5eb61e-cc5f-4052-926f-b56a9e2ab886’ has transitioned from state OfflineSavingCheckpoints to state Offline.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1204

Microsoft-Windows-FailoverClustering

The Cluster service successfully brought the clustered role ‘1b5eb61e-cc5f-4052-926f-b56a9e2ab886’ offline.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1641

Microsoft-Windows-FailoverClustering

Clustered role ‘1b5eb61e-cc5f-4052-926f-b56a9e2ab886’ is moving to cluster node ‘ABC-220-XY-01’.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘Quorum’ in clustered role ‘Cluster Group’ has transitioned from state OfflinePending to state OfflineSavingCheckpoints.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘Quorum’ in clustered role ‘Cluster Group’ has transitioned from state OfflineSavingCheckpoints to state Offline.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1204

Microsoft-Windows-FailoverClustering

The Cluster service successfully brought the clustered role ‘Cluster Group’ offline.

1/27/2017

4:35:57 PM

Information

ABC-220-XY-02.abc.org

1641

Microsoft-Windows-FailoverClustering

Clustered role ‘Cluster Group’ is moving to cluster node ‘ABC-220-XY-01’.

1/27/2017

4:45:56 PM

Information

ABC-220-XY-02.abc.org

1063

Microsoft-Windows-FailoverClustering

The Cluster service was successfully stopped.

 

 

 

List of outdated drivers:

 

 

Time/Date String

Product Version

File Version

Company Name

File Description

10/31/2012 14:38

(2.2:0.17)

(2.2:0.17)

Cisco Systems, Inc.

Cisco VIC FCoE Storport Miniport Driver

8/29/2014 14:59

(2.2:111.0)

(2.2:111.0)

Symantec Corporation

Allows granular display of back ups.

5/3/2014 4:58

(1.1:200.0)

(105.202:0.0)

Cisco Systems, Inc.

Cisco Nexus 1000V Virtual Switch Extension Driver

10/31/2012 21:30

(2.2:0.13)

(2.2:0.13)

Cisco Systems, Inc.

Cisco VIC Ethernet Driver

11/4/2010 14:33

(4.2:0.58)

(4.0:1.58)

VMware, Inc.

VMware Virtual Storage Volume Driver

 

 

______________________________________________________________________________________

 

 

System Information: ABC-220-XY-03

 

OS Name        Microsoft Windows Server 2012 R2 Datacenter

Version        6.3.9600 Build 9600

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ABC-220-XY-03

System Manufacturer        Cisco Systems Inc

System Model        UCSB-B200-M3

System Type        x64-based PC

System SKU       

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

BIOS Version/Date        Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013

 

 

Application Events:

 

  • Verified the Application events at the time of issue but was not able to find anything specific related to the Issue other than WMI not able to communicate properly.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

1/27/2017

8:15:08 PM

Information

ABC-220-XY-03.abc.org

5605

Microsoft-Windows-WMI

The root\mscluster namespace is marked with the RequiresEncryption flag. Access to this namespace might be denied if the script or application does not have the appropriate authentication level. Change the authentication level to Pkt_Privacy and run the script or application again.

1/27/2017

8:17:57 PM

Information

ABC-220-XY-03.abc.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

 

 

System Events:

 

  • Verified the System logs and found nothing Specific related to the Issue.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

1/27/2017

9:02:52 PM

Error

ABC-220-XY-03.abc.org

1

VDS Basic Provider

Unexpected failure. Error code: 48F@01000003

 

1/27/2017

9:00:57 PM

Information

ABC-220-XY-03.abc.org

7036

Service Control Manager

The Volume Shadow Copy service entered the running state.

1/27/2017

9:01:32 PM

Information

ABC-220-XY-03.abc.org

7036

Service Control Manager

The WMI Performance Adapter service entered the running state.

1/27/2017

9:02:42 PM

Information

ABC-220-XY-03.abc.org

7036

Service Control Manager

The Microsoft Software Shadow Copy Provider service entered the running state.

 

List of outdated drivers:

 

 

Time/Date String

Product Version

File Version

Company Name

File Description

10/31/2012 14:38

(2.2:0.17)

(2.2:0.17)

Cisco Systems, Inc.

Cisco VIC FCoE Storport Miniport Driver

8/29/2014 14:59

(2.2:111.0)

(2.2:111.0)

Symantec Corporation

Allows granular display of backups.

5/3/2014 4:58

(1.1:200.0)

(105.202:0.0)

Cisco Systems, Inc.

Cisco Nexus 1000V Virtual Switch Extension Driver

10/31/2012 21:30

(2.2:0.13)

(2.2:0.13)

Cisco Systems, Inc.

Cisco VIC Ethernet Driver

11/4/2010 14:33

(4.2:0.58)

(4.0:1.58)

VMware, Inc.

VMware Virtual Storage Volume Driver

 

 

_______________________________________________________________________________________

 

 

System Information: ABC-220-XY-04

 

OS Name        Microsoft Windows Server 2012 R2 Datacenter

Version        6.3.9600 Build 9600

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ABC-220-XY-04

System Manufacturer        Cisco Systems Inc

System Model        UCSB-B200-M3

System Type        x64-based PC

System SKU       

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

BIOS Version/Date        Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013

 

 

Application Events:

 

  • Checked and found that the Event related to the backup failure as it was not able to create Snapshots.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

1/27/2017

9:10:19 PM

Error

ABC-220-XY-04.abc.org

12293

VSS

Volume Shadow Copy Service error: Error calling a routine on a Shadow Copy Provider {400a2ff4-5eb1-44b0-8a05-1fcac0bcf9ff}. Routine details EndPrepareSnapshots({938b8e8c-a941-47b9-ac3f-04305a40c5a1}) [hr = 0x8000ffff, Catastrophic failure ].  Operation:    Executing Asynchronous Operation Context:    Current State: DoSnapshotSet

1/27/2017

8:18:02 PM

Information

ABC-220-XY-04.abc.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

 

 

System Events:

 

  • We got the Cluster Shared Volume going in Paused state around:8:23:32 PM for the first time.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

1/27/2017

8:23:32 PM

Error

ABC-220-XY-04.abc.org

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘Volume2’ (‘Volume2’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

1/27/2017

8:26:43 PM

Error

ABC-220-XY-04.abc.org

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘AB-General-VM02’ (‘AB-General-VM02’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

1/27/2017

8:28:26 PM

Error

ABC-220-XY-04.abc.org

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘Volume2’ (‘Volume2’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

 

  • Based on the Error Code c000020c

 

err(vista).exe’ c000020c

# for hex 0xc000020c / decimal -1073741300

  STATUS_CONNECTION_DISCONNECTED                                 ntstatus.h

# The transport connection is now disconnected.

# 1 matches found for “c000020c”

PS C:\Users\adix5025.INDIA\Downloads\ERR>

 

  • CSV was disconnected due to which we started getting this event.

 

1/27/2017

8:24:43 PM

Information

ABC-220-XY-04.abc.org

7036

Service Control Manager

The Volume Shadow Copy service entered the running state.

1/27/2017

8:24:43 PM

Information

ABC-220-XY-04.abc.org

7036

Service Control Manager

The Microsoft Software Shadow Copy Provider service entered the running state.

 

 

  • We are getting the same set of events with Backup Application not able to communicate properly and due to which the Backup failed.

 

1/27/2017

4:27:49 PM

Error

ABC-220-XY-04.abc.org

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer ABC-220-XY-03 using any of the configured protocols; requested by PID      c0c (C:\Program Files\Symantec\Backup Exec\RAWS\beremote.exe).

1/27/2017

4:27:57 PM

Information

ABC-220-XY-04.abc.org

7036

Service Control Manager

The WMI Performance Adapter service entered the running state.

1/27/2017

4:27:59 PM

Information

ABC-220-XY-04.abc.org

25

Microsoft-Windows-Hyper-V-VmSwitch

The MAC address 00-15-5D-00-65-39 has moved from port 3085CEB7-3748-4BC7-B7AE-E012A8751AC8 (Friendly Name: ) to port 8A189E20-C9D9-41A6-8171-BDEB400A8314 (Friendly Name: ).

1/27/2017

4:27:59 PM

Information

ABC-220-XY-04.abc.org

7036

Service Control Manager

The WMI Performance Adapter service entered the stopped state.

1/27/2017

4:28:05 PM

Information

ABC-220-XY-04.abc.org

98

Microsoft-Windows-Ntfs

Volume USHQ-FS-02-O (\Device\HarddiskVolume6) is healthy.  No action is needed.

1/27/2017

4:28:27 PM

Error

ABC-220-XY-04.abc.org

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer ABC-220-XY-03 using any of the configured protocols; requested by PID      c0c (C:\Program Files\Symantec\Backup Exec\RAWS\beremote.exe).

1/27/2017

4:29:07 PM

Error

ABC-220-XY-04.abc.org

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer ABC-220-XY-03 using any of the configured protocols; requested by PID      c0c (C:\Program Files\Symantec\Backup Exec\RAWS\beremote.exe).

1/27/2017

4:29:45 PM

Error

ABC-220-XY-04.abc.org

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer ABC-220-XY-03 using any of the configured protocols; requested by PID      c0c (C:\Program Files\Symantec\Backup Exec\RAWS\beremote.exe).

 

 

 

List of outdated drivers:

 

 

Time/Date String

Product Version

File Version

Company Name

File Description

10/31/2012 14:38

(2.2:0.17)

(2.2:0.17)

Cisco Systems, Inc.

Cisco VIC FCoE Storport Miniport Driver

8/29/2014 14:59

(2.2:111.0)

(2.2:111.0)

Symantec Corporation

Allows granular display of backups.

5/3/2014 4:58

(1.1:200.0)

(105.202:0.0)

Cisco Systems, Inc.

Cisco Nexus 1000V Virtual Switch Extension Driver

10/31/2012 21:30

(2.2:0.13)

(2.2:0.13)

Cisco Systems, Inc.

Cisco VIC Ethernet Driver

11/4/2010 14:33

(4.2:0.58)

(4.0:1.58)

VMware, Inc.

VMware Virtual Storage Volume Driver

 

________________________________________________________________________________________

 

Conclusion:

 

  • After analyzing the logs we can conclude that the issue started around 4:30 PM where there was a Backup job which was initiated, During this backup operation issue started with the Networking component of the Server. However we were not able to find any  events of Network Link Down or Network Failure but the Services like Dcom and WMI was not able to communicate with the other nodes of the Cluster which generally points out the issue, that the Network is not able to handle the traffic of the Backup operation with the client connections.

 

  • Since the Issue started with the Backup Operation I will recommend you to update the Symantec Application to the Latest version.

 

 

  • Investigate the Network timeout / latency / packet drops with the help of in house networking team.

Please Note : This step is the most critical while dealing with network connectivity issues.

           Investigation of Network Issues :

           We need to investigate the Network Connectivity Issues with the help of in-house networking team.

In order to avoid this issue in future the most critical part is to diagnose & investigate the consistent Network Connectivity Issue with Cluster Networks.

We need to check the network adapter, cables, and network configuration for the networks that connect the nodes.

We also need to check hubs, switches, or bridges in the networks that connect the nodes.

 

We need to check for Switch Delays & Proxy ARPs with the help of in-house Networking Team.

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply