RCA – 17 – CSV in Paused State During Backup

Issue Description:

 

Getting Event ID 5120, 5142 on Cluster Name: ORL-HVCLUSTER-PR01″ running a copy of “Microsoft Windows Server 2012 R2 DTC”

_________________________________________________________________________

 

System Information: ORL-220-VS-02

 

OS Name        Microsoft Windows Server 2012 R2 Datacenter

Version        6.3.9600 Build 9600

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ORL-220-VS-02

System Manufacturer        Cisco Systems Inc

System Model        UCSB-B200-M3

System Type        x64-based PC

System SKU        

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

BIOS Version/Date        Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013

 

 

System Events:

 

  • Analyzed the logs of Node: ORL-220-VS-02 and found that the VSS Service entered in Running State which generally explains that a VSS operation is running in the Background.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

2/10/2017

11:46:40 PM

Information

ORL-220-VS-02.ntm.org

7036

Service Control Manager

The Volume Shadow Copy service entered the running state.

2/10/2017

11:47:08 PM

Error

ORL-220-VS-02.ntm.org

1069

Microsoft-Windows-FailoverClustering

Cluster resource ‘Virtual Machine ST-NETSCALER-01’ of type ‘Virtual Machine’ in clustered role ‘ST-NETSCALER-01’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.

 

  • Checked the Logs around 11:54 and found that the VMs went to failed state, this is probably because the CSV went inaccessible on Node: ORL-220-VS-03 around 11:36:49 PM.

 

2/10/2017

11:54:00 PM

Error

ORL-220-VS-02.ntm.org

1069

Microsoft-Windows-FailoverClustering

Cluster resource ‘Virtual Machine ST-NETSCALER-01’ of type ‘Virtual Machine’ in clustered role ‘ST-NETSCALER-01’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.

2/10/2017

11:54:00 PM

Error

ORL-220-VS-02.ntm.org

1205

Microsoft-Windows-FailoverClustering

The Cluster service failed to bring clustered role ‘ST-NETSCALER-01’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

 

 

Cluster Events:

 

  • Found the Cluster task running around 11:39:32 PM.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

2/10/2017

11:39:32 PM

Information

ORL-220-VS-02.ntm.org

1641

Microsoft-Windows-FailoverClustering

Clustered role ‘SCVMM ST-ADMIN-01 Resources’ is moving to cluster node ‘ORL-220-VS-02’.

2/10/2017

11:39:32 PM

Information

ORL-220-VS-02.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘SCVMM ST-ADMIN-01 Configuration’ in clustered role ‘SCVMM ST-ADMIN-01 Resources’ has transitioned from state Offline to state OnlineCallIssued.

2/10/2017

11:39:32 PM

Information

ORL-220-VS-02.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘SCVMM ST-ADMIN-01’ in clustered role ‘SCVMM ST-ADMIN-01 Resources’ has transitioned from state Offline to state WaitingToComeOnline. Cluster resource ‘SCVMM ST-ADMIN-01’ is waiting on the following resources: SCVMM ST-ADMIN-01 Configuration.

2/10/2017

11:39:32 PM

Information

ORL-220-VS-02.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘SCVMM ST-ADMIN-01 Configuration’ in clustered role ‘SCVMM ST-ADMIN-01 Resources’ has transitioned from state OnlineCallIssued to state OnlinePending.

2/10/2017

11:39:32 PM

Information

ORL-220-VS-02.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘SCVMM ST-ADMIN-01 Configuration’ in clustered role ‘SCVMM ST-ADMIN-01 Resources’ has transitioned from state OnlinePending to state Online.

 

 

_____________________________________________________________________________________ 

System Information: ORL-220-VS-03

 

OS Name        Microsoft Windows Server 2012 R2 Datacenter

Version        6.3.9600 Build 9600

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ORL-220-VS-03

System Manufacturer        Cisco Systems Inc

System Model        UCSB-B200-M3

System Type        x64-based PC

System SKU        

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

BIOS Version/Date        Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013

 

 

Application Events:

 

  • Started Analyzing the logs from Node: ORL-220-VS-03 around 11:36 PM  and found events for the VSS service in operation.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

2/10/2017

11:06:29 PM

Information

ORL-220-VS-03.ntm.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

2/10/2017

11:10:55 PM

Information

ORL-220-VS-03.ntm.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

2/10/2017

11:17:06 PM

Information

ORL-220-VS-03.ntm.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

2/11/2017

3:03:28 AM

Error

ORL-220-VS-03.ntm.org

257

Microsoft-Windows-Defrag

The volume ST-General-VM01 (C:\ClusterStorage\ST-General-VM01) was not optimized because an error was encountered: The process cannot access the file because it is being used by another process. (0x80070020)

 

  • Found event ID 257 which gives us an idea about the Cluster Shared Volume being used by another process. This generally gives us an idea that the cluster shared volume was being accessed by another Application (Backup) due to which the Defrag operation was not able to continue.

 

 

2/10/2017

11:36:01 PM

Information

ORL-220-VS-03.ntm.org

7036

Service Control Manager

The Volume Shadow Copy service entered the running state.

2/10/2017

11:36:49 PM

Error

ORL-220-VS-03.ntm.org

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘ST-General-VM01’ (‘ST-General-VM01’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

 

  • After this we can see the cluster shared volume went inaccessible and then later went to failed state.

 

2/10/2017

11:36:49 PM

Error

ORL-220-VS-03.ntm.org

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘ST-General-VM01’ (‘ST-General-VM01’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

2/10/2017

11:42:40 PM

Error

ORL-220-VS-03.ntm.org

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘USHQ-FS-02-H’ (‘USHQ-FS-02-H’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

2/10/2017

11:43:20 PM

Error

ORL-220-VS-03.ntm.org

5120

Microsoft-Windows-FailoverClustering

Cluster Shared Volume ‘USHQ-FS-02-H’ (‘USHQ-FS-02-H’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

 

System Events:

 

  • Analyzed the logs but was not able to find anything specific related to the issue.

 

Cluster Events:

 

 

  • Verified the Cluster logs and found the Movement of Cluster Virtual Machines started around: 12:16 PM.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

2/10/2017

11:39:32 PM

Information

ORL-220-VS-03.ntm.org

1641

Microsoft-Windows-FailoverClustering

Clustered role ‘SCVMM ST-ADMIN-01 Resources’ is moving to cluster node ‘ORL-220-VS-02’.

2/10/2017

11:39:59 PM

Information

ORL-220-VS-03.ntm.org

1641

Microsoft-Windows-FailoverClustering

Clustered role ‘ST-ADMIN-02’ is moving to cluster node ‘ORL-220-VS-04’.

2/10/2017

11:42:26 PM

Information

ORL-220-VS-03.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state Online to state ProcessingFailure.

2/10/2017

11:42:26 PM

Information

ORL-220-VS-03.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state ProcessingFailure to state WaitingToTerminate. Cluster resource ‘ST-General-VM03’ is waiting on the following resources: .

 

____________________________________________________________________________________________

System Information: ORL-220-VS-04

 

OS Name        Microsoft Windows Server 2012 R2 Datacenter

Version        6.3.9600 Build 9600

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ORL-220-VS-04

System Manufacturer        Cisco Systems Inc

System Model        UCSB-B200-M3

System Type        x64-based PC

System SKU        

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

Processor        Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)

BIOS Version/Date        Cisco Systems, Inc. B200M3.2.2.1a.0.111220131105, 11/12/2013

 

 

Application Events:

 

  • Checked the events and found the VSS Service constantly shutting down.

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

2/10/2017

11:17:11 PM

Information

ORL-220-VS-04.ntm.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

2/10/2017

11:39:02 PM

Information

ORL-220-VS-04.ntm.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

2/10/2017

11:42:56 PM

Information

ORL-220-VS-04.ntm.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

2/10/2017

11:50:02 PM

Information

ORL-220-VS-04.ntm.org

5605

Microsoft-Windows-WMI

The root\mscluster namespace is marked with the RequiresEncryption flag. Access to this namespace might be denied if the script or application does not have the appropriate authentication level. Change the authentication level to Pkt_Privacy and run the script or application again.

2/10/2017

11:50:19 PM

Information

ORL-220-VS-04.ntm.org

8224

VSS

The VSS service is shutting down due to idle timeout. 

 

 

Cluster Events:

 

  • Found the events regarding the resource movement around 12:16 PM but we were not able to see any errors

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

2/10/2017

11:49:56 PM

Information

ORL-220-VS-04.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state Offline to state OnlineCallIssued.

2/10/2017

11:50:01 PM

Information

ORL-220-VS-04.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state OnlineCallIssued to state OnlinePending.

2/10/2017

11:50:02 PM

Information

ORL-220-VS-04.ntm.org

1637

Microsoft-Windows-FailoverClustering

Cluster resource ‘ST-General-VM03′ in clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ has transitioned from state OnlinePending to state Online.

2/10/2017

11:50:02 PM

Information

ORL-220-VS-04.ntm.org

1201

Microsoft-Windows-FailoverClustering

The Cluster service successfully brought the clustered role ’28a8dba1-091b-4b80-b8a7-6c88fd2ad9bd’ online.

 

______________________________________________________________________________________________ 

Conclusion:

 

  • As per our discussion you mentioned that the issue started after the we have initiated two simultaneous backups. As per the backup architecture when we initiate a Backup the Filter driver associated with the Backup application takes an Exclusive handle on the Volume due to which we generally gets errors like:

 

2/11/2017

3:03:28 AM

Error

ORL-220-VS-03.ntm.org

257

Microsoft-Windows-Defrag

The volume ST-General-VM01 (C:\ClusterStorage\ST-General-VM01) was not optimized because an error was encountered: The process cannot access the file because it is being used by another process. (0x80070020)

 

  • Where the Error code state the following:

 

\err(vista).exe’ 0x80070020

# for hex 0x80070020 / decimal -2147024864

  STIERR_SHARING_VIOLATION                                       stierr.h

# as an HRESULT: Severity: FAILURE (1), FACILITY_WIN32 (0x7), Code 0x20

# for hex 0x20 / decimal 32

  ERROR_SHARING_VIOLATION                                        winerror.h

# The process cannot access the file because it is being used by another process.

# 2 matches found for “0x80070020”

PS C:\Users\adix5025.INDIA\Downloads\ERR>

 

During this time if any other Application filter driver will try to take access to the same volume can make the Cluster Shared Volume go inaccessible and in some cases take the entire CSV Offline which can be fixed by Restating the Node who was owning that resource.

 

 

  • Based on our discussion I will recommend you not to run multiple Backup application at the same time as two filter drivers can operate at the same time on the same CSV.

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply