Issue Description:
We have four node cluster “ORL-HVCLUSTER-PR01” running a copy of “Microsoft Windows Server 2012 Standard” on which we got “Event ID: 5120 and 5142” “Cluster Shared Volume ‘USHQ-FS-02-G’ (‘USHQ-FS-02-G’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.” on 27th Jan 2017 which cause the VM running on CSV to crash.
_________________________________________________________________________________________
System Information: ABC-220-XY-02
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name ABC-220-XY-02
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013
Application Events:
- Went through the Application events and found the following events related to VSS services shutting down.
Date | Time | Type/Level | Computer Name | Event Code | Source | Description |
1/27/2017 | 6:52:07 PM | Information | ABC-220-XY-02.abc.org | 8224 | VSS | The VSS service is shutting down due to idle timeout. |
1/27/2017 | 6:54:21 PM | Warning | ABC-220-XY-02.abc.org | 1 | Navisphere Agent | N/A |
System Events:
- Verified the System events around 8:30 PM and found issues related to Network as MSDSM is showing events related to the DSM Driver Operation. However we have not seen any events related to the CSV going in Paused State on this Node.
Date | Time | Type/Level | Computer Name | Event Code | Source | Description |
1/27/2017 | 8:29:27 PM | Information | ABC-220-XY-02.abc.org | 37 | mpio | Microsoft DSM is attempting an operation on \Device\MPIODisk2. The Type is noted in the dump data. |
1/27/2017 | 8:29:50 PM | Warning | ABC-220-XY-02.abc.org | 27 | Microsoft-Windows-Hyper-V-VmSwitch | Networking driver in AB-NETSCALER-02 loaded but has a different version from the server. Server version 5.0 Client version 3.2 (Virtual machine ID C196386A-93CB-4085-AD4F-E90A2A89C301). The device will work, but this is an unsupported configuration. This means that technical support will not be provided until this problem is resolved. To fix this problem, upgrade the integration services. To upgrade, connect to the virtual machine and select Insert Integration Services Setup Disk from the Action menu. |
1/27/2017 | 8:30:24 PM | Information | ABC-220-XY-02.abc.org | 7036 | Service Control Manager | The Volume Shadow Copy service entered the running state. |
1/27/2017 | 8:30:24 PM | Information | ABC-220-XY-02.abc.org | 7036 | Service Control Manager | The Microsoft Software Shadow Copy Provider service entered the running state. |
1/27/2017 | 8:30:26 PM | Information | ABC-220-XY-02.abc.org | 45 | mpio | Microsoft DSM attempted an operation on \Device\MPIODisk8. This has occurred 1 times in the past 60 seconds. The Type is noted in the dump data. |
- After verifying the Events in the Cluster Events we found that the issue was started around: 4:35:56 PM where Network resources started going in offline State. And DCOM went down for the Application Backup Exec Application. Which gives the confirmation that there was a Backup Job Running in the Background.
1/27/2017 | 4:27:27 PM | Error | ABC-220-XY-02.abc.org | 10028 | Microsoft-Windows-DistributedCOM | DCOM was unable to communicate with the computer ABC-220-XY-01 using any of the configured protocols; requested by PID c08 (C:\PROGRAM FILES\SYMANTEC\BACKUP EXEC\RAWS\beremote.exe). |
1/27/2017 | 4:28:03 PM | Information | ABC-220-XY-02.abc.org | 25 | Microsoft-Windows-Hyper-V-VmSwitch | The MAC address 00-15-5D-00-65-42 has moved from port 99F07BF3-34BA-4F9F-9373-09ABF1910BC2 (Friendly Name: ) to port 503A7DEB-07C8-40D3-A1ED-48B0A2DF53B1 (Friendly Name: 47352527-915f-4c20-afc4-4bb9b6f2bb43). |
1/27/2017 | 4:28:05 PM | Error | ABC-220-XY-02.abc.org | 10028 | Microsoft-Windows-DistributedCOM | DCOM was unable to communicate with the computer ABC-220-XY-01 using any of the configured protocols; requested by PID c08 (C:\PROGRAM FILES\SYMANTEC\BACKUP EXEC\RAWS\beremote.exe). |
1/27/2017 | 4:28:05 PM | Information | ABC-220-XY-02.abc.org | 98 | Microsoft-Windows-Ntfs | Volume Volume2 (\Device\HarddiskVolume9) is healthy. No action is needed. |
1/27/2017 | 4:28:48 PM | Error | ABC-220-XY-02.abc.org | 10028 | Microsoft-Windows-DistributedCOM | DCOM was unable to communicate with the computer ABC-220-XY-01 using any of the configured protocols; requested by PID c08 (C:\PROGRAM FILES\SYMANTEC\BACKUP EXEC\RAWS\beremote.exe). |
1/27/2017 | 4:29:26 PM | Error | ABC-220-XY-02.abc.org | 10028 | Microsoft-Windows-DistributedCOM | DCOM was unable to communicate with the computer ABC-220-XY-01 using any of the configured protocols; requested by PID c08 (C:\PROGRAM FILES\SYMANTEC\BACKUP EXEC\RAWS\beremote.exe). |
Cluster Events:
- Cluster events shows the behavior of Resources failed on the Cluster Network and after which Resources started to Go Offline.
Date | Time | Type/Level | Computer Name | Event Code | Source | Description |
1/27/2017 | 4:35:56 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘Cluster IP Address’ in clustered role ‘Cluster Group’ has transitioned from state WaitingToGoOffline to state OfflineCallIssued. |
1/27/2017 | 4:35:56 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘Cluster IP Address’ in clustered role ‘Cluster Group’ has transitioned from state OfflineCallIssued to state OfflineSavingCheckpoints. |
1/27/2017 | 4:35:56 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘Volume1′ in clustered role ’73e1bc32-e654-4319-a9c9-3d65bf5faa8b’ has transitioned from state OfflineCallIssued to state OfflinePending. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘Quorum’ in clustered role ‘Cluster Group’ has transitioned from state Online to state WaitingToGoOffline. Cluster resource ‘Quorum’ is waiting on the following resources: . |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘Quorum’ in clustered role ‘Cluster Group’ has transitioned from state WaitingToGoOffline to state OfflineCallIssued. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘Volume1′ in clustered role ’73e1bc32-e654-4319-a9c9-3d65bf5faa8b’ has transitioned from state OfflinePending to state OfflineSavingCheckpoints. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘USHQ-FS-02-I’ in clustered role ‘1b5eb61e-cc5f-4052-926f-b56a9e2ab886’ has transitioned from state OfflinePending to state OfflineSavingCheckpoints. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘USHQ-FS-02-I’ in clustered role ‘1b5eb61e-cc5f-4052-926f-b56a9e2ab886’ has transitioned from state OfflineSavingCheckpoints to state Offline. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1204 | Microsoft-Windows-FailoverClustering | The Cluster service successfully brought the clustered role ‘1b5eb61e-cc5f-4052-926f-b56a9e2ab886’ offline. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1641 | Microsoft-Windows-FailoverClustering | Clustered role ‘1b5eb61e-cc5f-4052-926f-b56a9e2ab886’ is moving to cluster node ‘ABC-220-XY-01’. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘Quorum’ in clustered role ‘Cluster Group’ has transitioned from state OfflinePending to state OfflineSavingCheckpoints. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1637 | Microsoft-Windows-FailoverClustering | Cluster resource ‘Quorum’ in clustered role ‘Cluster Group’ has transitioned from state OfflineSavingCheckpoints to state Offline. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1204 | Microsoft-Windows-FailoverClustering | The Cluster service successfully brought the clustered role ‘Cluster Group’ offline. |
1/27/2017 | 4:35:57 PM | Information | ABC-220-XY-02.abc.org | 1641 | Microsoft-Windows-FailoverClustering | Clustered role ‘Cluster Group’ is moving to cluster node ‘ABC-220-XY-01’. |
1/27/2017 | 4:45:56 PM | Information | ABC-220-XY-02.abc.org | 1063 | Microsoft-Windows-FailoverClustering | The Cluster service was successfully stopped. |
List of outdated drivers:
Time/Date String | Product Version | File Version | Company Name | File Description |
10/31/2012 14:38 | (2.2:0.17) | (2.2:0.17) | Cisco Systems, Inc. | Cisco VIC FCoE Storport Miniport Driver |
8/29/2014 14:59 | (2.2:111.0) | (2.2:111.0) | Symantec Corporation | Allows granular display of back ups. |
5/3/2014 4:58 | (1.1:200.0) | (105.202:0.0) | Cisco Systems, Inc. | Cisco Nexus 1000V Virtual Switch Extension Driver |
10/31/2012 21:30 | (2.2:0.13) | (2.2:0.13) | Cisco Systems, Inc. | Cisco VIC Ethernet Driver |
11/4/2010 14:33 | (4.2:0.58) | (4.0:1.58) | VMware, Inc. | VMware Virtual Storage Volume Driver |
______________________________________________________________________________________
System Information: ABC-220-XY-03
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name ABC-220-XY-03
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013
Application Events:
- Verified the Application events at the time of issue but was not able to find anything specific related to the Issue other than WMI not able to communicate properly.
Date | Time | Type/Level | Computer Name | Event Code | Source | Description |
1/27/2017 | 8:15:08 PM | Information | ABC-220-XY-03.abc.org | 5605 | Microsoft-Windows-WMI | The root\mscluster namespace is marked with the RequiresEncryption flag. Access to this namespace might be denied if the script or application does not have the appropriate authentication level. Change the authentication level to Pkt_Privacy and run the script or application again. |
1/27/2017 | 8:17:57 PM | Information | ABC-220-XY-03.abc.org | 8224 | VSS | The VSS service is shutting down due to idle timeout. |
System Events:
- Verified the System logs and found nothing Specific related to the Issue.
Date | Time | Type/Level | Computer Name | Event Code | Source | Description |
1/27/2017 | 9:02:52 PM | Error | ABC-220-XY-03.abc.org | 1 | VDS Basic Provider | Unexpected failure. Error code: 48F@01000003 |
1/27/2017 | 9:00:57 PM | Information | ABC-220-XY-03.abc.org | 7036 | Service Control Manager | The Volume Shadow Copy service entered the running state. |
1/27/2017 | 9:01:32 PM | Information | ABC-220-XY-03.abc.org | 7036 | Service Control Manager | The WMI Performance Adapter service entered the running state. |
1/27/2017 | 9:02:42 PM | Information | ABC-220-XY-03.abc.org | 7036 | Service Control Manager | The Microsoft Software Shadow Copy Provider service entered the running state. |
List of outdated drivers:
Time/Date String | Product Version | File Version | Company Name | File Description |
10/31/2012 14:38 | (2.2:0.17) | (2.2:0.17) | Cisco Systems, Inc. | Cisco VIC FCoE Storport Miniport Driver |
8/29/2014 14:59 | (2.2:111.0) | (2.2:111.0) | Symantec Corporation | Allows granular display of backups. |
5/3/2014 4:58 | (1.1:200.0) | (105.202:0.0) | Cisco Systems, Inc. | Cisco Nexus 1000V Virtual Switch Extension Driver |
10/31/2012 21:30 | (2.2:0.13) | (2.2:0.13) | Cisco Systems, Inc. | Cisco VIC Ethernet Driver |
11/4/2010 14:33 | (4.2:0.58) | (4.0:1.58) | VMware, Inc. | VMware Virtual Storage Volume Driver |
_______________________________________________________________________________________
System Information: ABC-220-XY-04
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name ABC-220-XY-04
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013
Application Events:
- Checked and found that the Event related to the backup failure as it was not able to create Snapshots.
Date | Time | Type/Level | Computer Name | Event Code | Source | Description |
1/27/2017 | 9:10:19 PM | Error | ABC-220-XY-04.abc.org | 12293 | VSS | Volume Shadow Copy Service error: Error calling a routine on a Shadow Copy Provider {400a2ff4-5eb1-44b0-8a05-1fcac0bcf9ff}. Routine details EndPrepareSnapshots({938b8e8c-a941-47b9-ac3f-04305a40c5a1}) [hr = 0x8000ffff, Catastrophic failure ]. Operation: Executing Asynchronous Operation Context: Current State: DoSnapshotSet |
1/27/2017 | 8:18:02 PM | Information | ABC-220-XY-04.abc.org | 8224 | VSS | The VSS service is shutting down due to idle timeout. |
System Events:
- We got the Cluster Shared Volume going in Paused state around:8:23:32 PM for the first time.
Date | Time | Type/Level | Computer Name | Event Code | Source | Description |
1/27/2017 | 8:23:32 PM | Error | ABC-220-XY-04.abc.org | 5120 | Microsoft-Windows-FailoverClustering | Cluster Shared Volume ‘Volume2’ (‘Volume2’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/27/2017 | 8:26:43 PM | Error | ABC-220-XY-04.abc.org | 5120 | Microsoft-Windows-FailoverClustering | Cluster Shared Volume ‘AB-General-VM02’ (‘AB-General-VM02’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/27/2017 | 8:28:26 PM | Error | ABC-220-XY-04.abc.org | 5120 | Microsoft-Windows-FailoverClustering | Cluster Shared Volume ‘Volume2’ (‘Volume2’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
- Based on the Error Code c000020c
err(vista).exe’ c000020c
# for hex 0xc000020c / decimal -1073741300
STATUS_CONNECTION_DISCONNECTED ntstatus.h
# The transport connection is now disconnected.
# 1 matches found for “c000020c”
PS C:\Users\adix5025.INDIA\Downloads\ERR>
- CSV was disconnected due to which we started getting this event.
1/27/2017 | 8:24:43 PM | Information | ABC-220-XY-04.abc.org | 7036 | Service Control Manager | The Volume Shadow Copy service entered the running state. |
1/27/2017 | 8:24:43 PM | Information | ABC-220-XY-04.abc.org | 7036 | Service Control Manager | The Microsoft Software Shadow Copy Provider service entered the running state. |
- We are getting the same set of events with Backup Application not able to communicate properly and due to which the Backup failed.
1/27/2017 | 4:27:49 PM | Error | ABC-220-XY-04.abc.org | 10028 | Microsoft-Windows-DistributedCOM | DCOM was unable to communicate with the computer ABC-220-XY-03 using any of the configured protocols; requested by PID c0c (C:\Program Files\Symantec\Backup Exec\RAWS\beremote.exe). |
1/27/2017 | 4:27:57 PM | Information | ABC-220-XY-04.abc.org | 7036 | Service Control Manager | The WMI Performance Adapter service entered the running state. |
1/27/2017 | 4:27:59 PM | Information | ABC-220-XY-04.abc.org | 25 | Microsoft-Windows-Hyper-V-VmSwitch | The MAC address 00-15-5D-00-65-39 has moved from port 3085CEB7-3748-4BC7-B7AE-E012A8751AC8 (Friendly Name: ) to port 8A189E20-C9D9-41A6-8171-BDEB400A8314 (Friendly Name: ). |
1/27/2017 | 4:27:59 PM | Information | ABC-220-XY-04.abc.org | 7036 | Service Control Manager | The WMI Performance Adapter service entered the stopped state. |
1/27/2017 | 4:28:05 PM | Information | ABC-220-XY-04.abc.org | 98 | Microsoft-Windows-Ntfs | Volume USHQ-FS-02-O (\Device\HarddiskVolume6) is healthy. No action is needed. |
1/27/2017 | 4:28:27 PM | Error | ABC-220-XY-04.abc.org | 10028 | Microsoft-Windows-DistributedCOM | DCOM was unable to communicate with the computer ABC-220-XY-03 using any of the configured protocols; requested by PID c0c (C:\Program Files\Symantec\Backup Exec\RAWS\beremote.exe). |
1/27/2017 | 4:29:07 PM | Error | ABC-220-XY-04.abc.org | 10028 | Microsoft-Windows-DistributedCOM | DCOM was unable to communicate with the computer ABC-220-XY-03 using any of the configured protocols; requested by PID c0c (C:\Program Files\Symantec\Backup Exec\RAWS\beremote.exe). |
1/27/2017 | 4:29:45 PM | Error | ABC-220-XY-04.abc.org | 10028 | Microsoft-Windows-DistributedCOM | DCOM was unable to communicate with the computer ABC-220-XY-03 using any of the configured protocols; requested by PID c0c (C:\Program Files\Symantec\Backup Exec\RAWS\beremote.exe). |
List of outdated drivers:
Time/Date String | Product Version | File Version | Company Name | File Description |
10/31/2012 14:38 | (2.2:0.17) | (2.2:0.17) | Cisco Systems, Inc. | Cisco VIC FCoE Storport Miniport Driver |
8/29/2014 14:59 | (2.2:111.0) | (2.2:111.0) | Symantec Corporation | Allows granular display of backups. |
5/3/2014 4:58 | (1.1:200.0) | (105.202:0.0) | Cisco Systems, Inc. | Cisco Nexus 1000V Virtual Switch Extension Driver |
10/31/2012 21:30 | (2.2:0.13) | (2.2:0.13) | Cisco Systems, Inc. | Cisco VIC Ethernet Driver |
11/4/2010 14:33 | (4.2:0.58) | (4.0:1.58) | VMware, Inc. | VMware Virtual Storage Volume Driver |
________________________________________________________________________________________
Conclusion:
- After analyzing the logs we can conclude that the issue started around 4:30 PM where there was a Backup job which was initiated, During this backup operation issue started with the Networking component of the Server. However we were not able to find any events of Network Link Down or Network Failure but the Services like Dcom and WMI was not able to communicate with the other nodes of the Cluster which generally points out the issue, that the Network is not able to handle the traffic of the Backup operation with the client connections.
- Since the Issue started with the Backup Operation I will recommend you to update the Symantec Application to the Latest version.
- Also install the Latest updates related to the Cluster by following the Article: https://support.microsoft.com/en-us/help/2920151/recommended-hotfixes-and-updates-for-windows-server-2012-r2-based-failover-clusters
- Investigate the Network timeout / latency / packet drops with the help of in house networking team.
Please Note : This step is the most critical while dealing with network connectivity issues.
Investigation of Network Issues :
We need to investigate the Network Connectivity Issues with the help of in-house networking team.
In order to avoid this issue in future the most critical part is to diagnose & investigate the consistent Network Connectivity Issue with Cluster Networks.
We need to check the network adapter, cables, and network configuration for the networks that connect the nodes.
We also need to check hubs, switches, or bridges in the networks that connect the nodes.
We need to check for Switch Delays & Proxy ARPs with the help of in-house Networking Team.