Issue Description:
Getting Event ID 5120, 5142 on Cluster Name: ORL-HVCLUSTER-PR01″ running a copy of “Microsoft Windows Server 2012 R2 DTC”
_________________________________________________________________________
System Information: ORL-220-VS-02
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name ORL-220-VS-02
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013
System Events:
- Analyzed the logs of Node: ORL-220-VS-02 and found that the VSS Service entered in Running State which generally explains that a VSS operation is running in the Background.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
2/10/2017 |
11:46:40 PM |
Information |
ORL-220-VS-02.ntm.org |
7036 |
Service Control Manager |
The Volume Shadow Copy service entered the running state. |
2/10/2017 |
11:47:08 PM |
Error |
ORL-220-VS-02.ntm.org |
1069 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘Virtual Machine ST-NETSCALER-01’ of type ‘Virtual Machine’ in clustered role ‘ST-NETSCALER-01’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. |
- Checked the Logs around 11:54 and found that the VMs went to failed state, this is probably because the CSV went inaccessible on Node: ORL-220-VS-03 around 11:36:49 PM.
2/10/2017 |
11:54:00 PM |
Error |
ORL-220-VS-02.ntm.org |
1069 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘Virtual Machine ST-NETSCALER-01’ of type ‘Virtual Machine’ in clustered role ‘ST-NETSCALER-01’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. |
2/10/2017 |
11:54:00 PM |
Error |
ORL-220-VS-02.ntm.org |
1205 |
Microsoft-Windows-FailoverClustering |
The Cluster service failed to bring clustered role ‘ST-NETSCALER-01’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role. |
Cluster Events:
- Found the Cluster task running around 11:39:32 PM.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
2/10/2017 |
11:39:32 PM |
Information |
ORL-220-VS-02.ntm.org |
1641 |
Microsoft-Windows-FailoverClustering |
Clustered role ‘SCVMM ST-ADMIN-01 Resources’ is moving to cluster node ‘ORL-220-VS-02’. |
2/10/2017 |
11:39:32 PM |
Information |
ORL-220-VS-02.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘SCVMM ST-ADMIN-01 Configuration’ in clustered role ‘SCVMM ST-ADMIN-01 Resources’ has transitioned from state Offline to state OnlineCallIssued. |
2/10/2017 |
11:39:32 PM |
Information |
ORL-220-VS-02.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘SCVMM ST-ADMIN-01’ in clustered role ‘SCVMM ST-ADMIN-01 Resources’ has transitioned from state Offline to state WaitingToComeOnline. Cluster resource ‘SCVMM ST-ADMIN-01’ is waiting on the following resources: SCVMM ST-ADMIN-01 Configuration. |
2/10/2017 |
11:39:32 PM |
Information |
ORL-220-VS-02.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘SCVMM ST-ADMIN-01 Configuration’ in clustered role ‘SCVMM ST-ADMIN-01 Resources’ has transitioned from state OnlineCallIssued to state OnlinePending. |
2/10/2017 |
11:39:32 PM |
Information |
ORL-220-VS-02.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘SCVMM ST-ADMIN-01 Configuration’ in clustered role ‘SCVMM ST-ADMIN-01 Resources’ has transitioned from state OnlinePending to state Online. |
_____________________________________________________________________________________
System Information: ORL-220-VS-03
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name ORL-220-VS-03
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013
Application Events:
- Started Analyzing the logs from Node: ORL-220-VS-03 around 11:36 PM and found events for the VSS service in operation.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
2/10/2017 |
11:06:29 PM |
Information |
ORL-220-VS-03.ntm.org |
8224 |
VSS |
The VSS service is shutting down due to idle timeout. |
2/10/2017 |
11:10:55 PM |
Information |
ORL-220-VS-03.ntm.org |
8224 |
VSS |
The VSS service is shutting down due to idle timeout. |
2/10/2017 |
11:17:06 PM |
Information |
ORL-220-VS-03.ntm.org |
8224 |
VSS |
The VSS service is shutting down due to idle timeout. |
2/11/2017 |
3:03:28 AM |
Error |
ORL-220-VS-03.ntm.org |
257 |
Microsoft-Windows-Defrag |
The volume ST-General-VM01 (C:\ClusterStorage\ST-General-VM01) was not optimized because an error was encountered: The process cannot access the file because it is being used by another process. (0x80070020) |
- Found event ID 257 which gives us an idea about the Cluster Shared Volume being used by another process. This generally gives us an idea that the cluster shared volume was being accessed by another Application (Backup) due to which the Defrag operation was not able to continue.
2/10/2017 |
11:36:01 PM |
Information |
ORL-220-VS-03.ntm.org |
7036 |
Service Control Manager |
The Volume Shadow Copy service entered the running state. |
2/10/2017 |
11:36:49 PM |
Error |
ORL-220-VS-03.ntm.org |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ST-General-VM01’ (‘ST-General-VM01’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
- After this we can see the cluster shared volume went inaccessible and then later went to failed state.
2/10/2017 |
11:36:49 PM |
Error |
ORL-220-VS-03.ntm.org |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ST-General-VM01’ (‘ST-General-VM01’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
2/10/2017 |
11:42:40 PM |
Error |
ORL-220-VS-03.ntm.org |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘USHQ-FS-02-H’ (‘USHQ-FS-02-H’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
2/10/2017 |
11:43:20 PM |
Error |
ORL-220-VS-03.ntm.org |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘USHQ-FS-02-H’ (‘USHQ-FS-02-H’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
System Events:
- Analyzed the logs but was not able to find anything specific related to the issue.
Cluster Events:
- Verified the Cluster logs and found the Movement of Cluster Virtual Machines started around: 12:16 PM.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
2/10/2017 |
11:39:32 PM |
Information |
ORL-220-VS-03.ntm.org |
1641 |
Microsoft-Windows-FailoverClustering |
Clustered role ‘SCVMM ST-ADMIN-01 Resources’ is moving to cluster node ‘ORL-220-VS-02’. |
2/10/2017 |
11:39:59 PM |
Information |
ORL-220-VS-03.ntm.org |
1641 |
Microsoft-Windows-FailoverClustering |
Clustered role ‘ST-ADMIN-02’ is moving to cluster node ‘ORL-220-VS-04’. |
2/10/2017 |
11:42:26 PM |
Information |
ORL-220-VS-03.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state Online to state ProcessingFailure. |
2/10/2017 |
11:42:26 PM |
Information |
ORL-220-VS-03.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state ProcessingFailure to state WaitingToTerminate. Cluster resource ‘ST-General-VM03’ is waiting on the following resources: . |
____________________________________________________________________________________________
System Information: ORL-220-VS-04
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name ORL-220-VS-04
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013
Application Events:
- Checked the events and found the VSS Service constantly shutting down.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
2/10/2017 |
11:17:11 PM |
Information |
ORL-220-VS-04.ntm.org |
8224 |
VSS |
The VSS service is shutting down due to idle timeout. |
2/10/2017 |
11:39:02 PM |
Information |
ORL-220-VS-04.ntm.org |
8224 |
VSS |
The VSS service is shutting down due to idle timeout. |
2/10/2017 |
11:42:56 PM |
Information |
ORL-220-VS-04.ntm.org |
8224 |
VSS |
The VSS service is shutting down due to idle timeout. |
2/10/2017 |
11:50:02 PM |
Information |
ORL-220-VS-04.ntm.org |
5605 |
Microsoft-Windows-WMI |
The root\mscluster namespace is marked with the RequiresEncryption flag. Access to this namespace might be denied if the script or application does not have the appropriate authentication level. Change the authentication level to Pkt_Privacy and run the script or application again. |
2/10/2017 |
11:50:19 PM |
Information |
ORL-220-VS-04.ntm.org |
8224 |
VSS |
The VSS service is shutting down due to idle timeout. |
Cluster Events:
- Found the events regarding the resource movement around 12:16 PM but we were not able to see any errors
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
2/10/2017 |
11:49:56 PM |
Information |
ORL-220-VS-04.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state Offline to state OnlineCallIssued. |
2/10/2017 |
11:50:01 PM |
Information |
ORL-220-VS-04.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state OnlineCallIssued to state OnlinePending. |
2/10/2017 |
11:50:02 PM |
Information |
ORL-220-VS-04.ntm.org |
1637 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state OnlinePending to state Online. |
2/10/2017 |
11:50:02 PM |
Information |
ORL-220-VS-04.ntm.org |
1201 |
Microsoft-Windows-FailoverClustering |
The Cluster service successfully brought the clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ online. |
______________________________________________________________________________________________
Conclusion:
- As per our discussion you mentioned that the issue started after the we have initiated two simultaneous backups. As per the backup architecture when we initiate a Backup the Filter driver associated with the Backup application takes an Exclusive handle on the Volume due to which we generally gets errors like:
2/11/2017 |
3:03:28 AM |
Error |
ORL-220-VS-03.ntm.org |
257 |
Microsoft-Windows-Defrag |
The volume ST-General-VM01 (C:\ClusterStorage\ST-General-VM01) was not optimized because an error was encountered: The process cannot access the file because it is being used by another process. (0x80070020) |
- Where the Error code state the following:
\err(vista).exe’ 0x80070020
# for hex 0x80070020 / decimal -2147024864
STIERR_SHARING_VIOLATION stierr.h
# as an HRESULT: Severity: FAILURE (1), FACILITY_WIN32 (0x7), Code 0x20
# for hex 0x20 / decimal 32
ERROR_SHARING_VIOLATION winerror.h
# The process cannot access the file because it is being used by another process.
# 2 matches found for “0x80070020”
PS C:\Users\adix5025.INDIA\Downloads\ERR>
During this time if any other Application filter driver will try to take access to the same volume can make the Cluster Shared Volume go inaccessible and in some cases take the entire CSV Offline which can be fixed by Restating the Node who was owning that resource.
- Based on our discussion I will recommend you not to run multiple Backup application at the same time as two filter drivers can operate at the same time on the same CSV.