RCA- 4 - Cluster Resource Crashed or Deadlocked

Issue Description:

Getting Event Id: 1230 “Cluster resource ‘FileServer-(Condor)’ (resource type ”, DLL ‘clusres.dll’) either crashed or deadlocked. ” on Cluster Name: EKNCL04 Running a copy of Microsoft Windows Server 2008 R2 Enterprise Version 6.1.7601 Service Pack 1 Build 7601

Initial Description:

>>As we know that in this case the resources failover from one node to another this generally happens when the node on which the resource was running is no more capable of running that resource. This may be due to lack of essential components like unable to access storage or Loss of network connectivity. Sometimes the Node on which the resource was running gets evicted from the failover clustering membership (event id 1135) which makes the resources to failover to another node.

Why is Event ID 1135 Logged ?

This event will be logged on all nodes in the Cluster except for the node that was removed. The reason for this event is because one of the nodes in the Cluster marked that node as down. It then notifies all of the other nodes of the event. When the nodes are notified, they discontinue and tear down their heartbeat connections to the downed node.

What caused the node to be marked down?

All nodes in a Windows 2008 or 2008 R2 Failover Cluster talk to each other over the networks that are set to Allow cluster network communication on this network. The nodes will send out heartbeat packets across these networks to all of the other nodes. These packets are supposed to be received by the other nodes and then a response is sent back. Each node in the Cluster has its own heartbeats that it is going to monitor to ensure the network is up and the other nodes are up. The example below should help clarify this:

If any one of these packets are not returned, then the specific heartbeat is considered failed. For example, W2K8-R2-NODE2 sends a request and receives a response from W2K8-R2-NODE1 to a heartbeat packet so it determines the network and the node is up. If W2K8-R2-NODE1 sends a request to W2K8-R2-NODE2 and W2K8-R2-NODE1 does not get the response, it is considered a lost heartbeat and W2K8-R2-NODE1 keeps track of it. This missed response can have W2K8-R2-NODE1 show the network as down until another heartbeat request is received.

By default, Cluster nodes have a limit of 5 failures in 5 seconds before the connection is marked down. So if W2K8-R2-NODE1 does not receive the response 5 times in the time period, it considers that particular route to W2K8-R2-NODE2 to be down. If other routes are still considered to be up, W2K8-R2-NODE2 will remain as an active member.

If all routes are marked down for W2K8-R2-NODE2, it is removed from active Failover Cluster membership and the Event 1135 that you see in the first section is logged. On W2K8-R2-NODE2, the Cluster Service is terminated and then restarted so it can try to rejoin the Cluster.

Reference :

Having a problem with nodes being removed from active Failover Cluster membership?

http://blogs.technet.com/b/askcore/archive/2012/02/08/having-a-problem-with-nodes-being-removed-from-active-failover-cluster-membership.aspx

________________________________________________________________________

System Information: CLSTRFILE04

OS Name Microsoft Windows Server 2008 R2 Enterprise

Version 6.1.7601 Service Pack 1 Build 7601

Other OS Description Not Available

OS Manufacturer Microsoft Corporation

System Name CLSTRFILE04

System Manufacturer VMware, Inc.

System Model VMware Virtual Platform

System Type x64-based PC

Processor Intel(R) Xeon(R) CPU E5649 @ 2.53GHz, 2533 Mhz, 2 Core(s), 2 Logical Processor(s)

BIOS Version/Date Phoenix Technologies LTD 6.00, 30/07/2013

System Events:

Checked the events and found that the Cluster Node: ABCFILE08 got evicted from the FCM around 4:29:55 PM.

Date	Time	Type/Level	Computer Name	Event Code	Source	Description
6/1/2016	4:29:55 PM	Critical	CLSTRFILE04.ABC.com	1135	Microsoft-Windows-FailoverClustering	Cluster node ‘ABCFILE08’ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

Application Events:

Checked the application logs but was not able to find any event related to the issue.

List of outdated drivers:

Time/Date String	Product Version	File Version	Company Name	File Description
2/28/2007 0:04	(6.0:6001.16459)	(7.2:0.0)	Adaptec, Inc.	Adaptec StorPort Ultra320 SCSI Driver (X64)
3/20/2009 18:36	(3.6:1540.127)	(3.6:1540.127)	AMD Technologies Inc.	AMD Technology AHCI Compatible Controller Driver for Windows – AMD64 platform
1/14/2009 19:27	(5.2:0.16119)	(5.2:0.16119)	Adaptec, Inc.	Adaptec SAS RAID WS03 Driver
4/26/2009 12:14	(10.100:4.0)	(10.100:4.0)	Broadcom Corporation	Broadcom NetXtreme Gigabit Ethernet NDIS6.x Unified Driver.
8/7/2006 2:51	(1.0:1.1)	(1.0:1.6)	Brother Industries Ltd.	Brotehr Serial I/F Driver (WDM)
8/7/2006 2:51	(6.0:5479.0)	(1.0:0.12)	Brother Industries Ltd.	Brother USB MDM Driver
2/13/2009 22:18	(4.8:2.0)	(4.8:2.0)	Broadcom Corporation	Broadcom NetXtreme II GigE VBD
5/29/2008 0:14	(6.0:6001.18000)	(8.4:1.0)	Intel Corporation	Intel(R) PRO/1000 Adapter NDIS 6 deserialized driver
12/31/2008 16:29	(4.8:13.0)	(4.8:13.0)	Broadcom Corporation	Broadcom NetXtreme II 10 GigE VBD
12/13/2005 21:47	(0.4:22.0)	(5.4:22.0)	Intel Corp./ICP vortex GmbH	Intel/ICP Raid Storport Driver
4/16/2009 23:13	(6.1:7083.0)	(1.28:3.67)	LSI Corporation	LSI Fusion-MPT SCSI Driver (StorPort)
5/19/2009 2:09	(4.5:1.64)	(4.5:1.64)	LSI Corporation	MEGASAS RAID Controller Driver for Windows 7\Server 2008 R2 for x64
5/19/2009 2:25	13.05.0409.2009	(13.5:409.2009)	LSI Corporation, Inc.	LSI MegaRAID Software RAID Driver
6/6/2006 22:11	(7.10:0.0)	(7.10:0.0)	IBM Corporation	IBM ServeRAID Controller Driver
8/10/2007 0:47	(1.2:78.3)	(1.2:78.3)	Intel Corporation	Intel(R) 5000 Series Chipsets Integrated Device – 1A38
1/22/2009 23:05	(9.1:8.6)	(9.1:8.6)	QLogic Corporation	QLogic Fibre Channel Stor Miniport Driver
5/19/2009 2:18	(2.1:3.20)	(2.1:3.20)	QLogic Corporation	QLogic iSCSI Storport Miniport Driver
9/13/2006 14:18	(4.3:86.0)	(4.3:86.0)	Macrovision Corporation, Macrovision Europe Limited, and Macrovision Japan and Asia K.K.	Macrovision SECURITY Driver
7/14/2009 0:19	(6.0:6000.170)	(6.0:6000.170)	VIA Technologies, Inc.	VIA Generic PCI IDE Bus Driver
1/31/2009 1:18	(6.0:6000.6210)	(6.0:6000.6210)	VIA Technologies Inc.,Ltd	VIA RAID DRIVER FOR AMD-X86-64

Cluster Events:

Checked the events and found that the cluster networks are coming online.

6/1/2016	4:29:55 PM	Information	CLSTRFILE04.ABC.com	1204	Microsoft-Windows-FailoverClustering	The Cluster service successfully brought the clustered service or application ‘Available Storage’ offline.
6/1/2016	4:29:55 PM	Information	CLSTRFILE04.ABC.com	1125	Microsoft-Windows-FailoverClustering	Cluster network interface ‘CLSTRFILE04 – Service LAN’ for cluster node ‘CLSTRFILE04’ on network ‘Cluster Network 3’ is operational (up). The node can communicate with all other available failover cluster nodes on the network.

______________________________________________________________________________

System Information: ABCFILE07

OS Name Microsoft Windows Server 2008 R2 Enterprise

Version 6.1.7601 Service Pack 1 Build 7601

Other OS Description Not Available

OS Manufacturer Microsoft Corporation

System Name ABCFILE07

System Manufacturer HP

System Model ProLiant DL360p Gen8

System Type x64-based PC

Processor Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz, 2500 Mhz, 6 Core(s), 12 Logical Processor(s)

BIOS Version/Date HP P71, 08/09/2013

System Events:

Getting an event: 1085 related to folder redirection.

At 4:29:55 PM Cluster node ABCFILE08 is removed from the FCM.

Date	Time	Type/Level	Computer Name	Event Code	Source	Description
6/1/2016	4:25:49 PM	Warning	ABCFILE07.ABC.com	1085	Microsoft-Windows-GroupPolicy	Windows failed to apply the Folder Redirection settings. Folder Redirection settings might have its own log file. Please click on the ‘More information’ link.
6/1/2016	4:29:55 PM	Critical	ABCFILE07.ABC.com	1135	Microsoft-Windows-FailoverClustering	Cluster node ‘ABCFILE08’ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

Application Events:

Date	Time	Type/Level	Computer Name	Event Code	Source	Description
6/1/2016	4:25:48 PM	Error	ABCFILE07.ABC.com	502	Microsoft-Windows-Folder Redirection	Failed to apply policy and redirect folder ‘Documents’ to ‘\\abcfs01\abc$\dave.farmer\My Documents’. Redirection options=0x9231. The following error occurred: ‘Can not create folder ‘\\abcfs01\abc$\dave.farmer\My Documents”. Error details: ‘This security ID may not be assigned as the owner of this object. ‘.

Cluster Events:

6/1/2016

4:29:55 PM

Information

ABCFILE07.ABC.com

1125

Microsoft-Windows-FailoverClustering

Cluster network interface ‘CLSTRFILE04 – Service LAN’ for cluster node ‘CLSTRFILE04’ on network ‘Cluster Network 3’ is operational (up). The node can communicate with all other available failover cluster nodes on the network.

List of outdated drivers:

Time/Date String	Product Version	File Version	Company Name	File Description
12/5/2008 23:54	(6.1:3790.0)	(1.6:6.4)	Adaptec, Inc.	Adaptec Windows SAS/SATA Storport Driver
5/1/2007 18:30	(6.0:3790.16512)	(1.6:6.1)	Adaptec, Inc.	Adaptec Windows SATA Storport Driver
2/28/2007 0:04	(6.0:6001.16459)	(7.2:0.0)	Adaptec, Inc.	Adaptec StorPort Ultra320 SCSI Driver (X64)
3/19/2010 16:18	(1.1:2.5)	(1.1:2.5)	Advanced Micro Devices	Storage Filter Driver
2/13/2009 22:18	(4.8:2.0)	(4.8:2.0)	Broadcom Corporation	Broadcom NetXtreme II GigE VBD
2/3/2009 22:52	(7.2:10.211)	(7.2:10.211)	Emulex	Storport Miniport Driver for LightPulse HBAs
12/31/2008 16:29	(4.8:13.0)	(4.8:13.0)	Broadcom Corporation	Broadcom NetXtreme II 10 GigE VBD
4/24/2003 19:03	(6.0:1.0)	(6.0:1.0)	Broadcom Corporation	Frame Access Driver
6/11/2010 1:46	(8.6:2.1014)	(8.6:2.1014)	Intel Corporation	Intel Matrix Storage Manager driver – x64
12/13/2005 21:47	(0.4:22.0)	(5.4:22.0)	Intel Corp./ICP vortex GmbH	Intel/ICP Raid Storport Driver
12/2/2009 21:36	(5.2:3790.1830)	(1.3:0.4)	Intel Corporation	Intel(R) Network Adapter Diagnostic Driver
5/19/2009 2:09	(4.5:1.64)	(4.5:1.64)	LSI Corporation	MEGASAS RAID Controller Driver for Windows 7\Server 2008 R2 for x64
5/19/2009 2:25	13.05.0409.2009	(13.5:409.2009)	LSI Corporation, Inc.	LSI MegaRAID Software RAID Driver
6/6/2006 22:11	(7.10:0.0)	(7.10:0.0)	IBM Corporation	IBM ServeRAID Controller Driver
8/10/2007 0:47	(1.2:78.3)	(1.2:78.3)	Intel Corporation	Intel(R) 5000 Series Chipsets Integrated Device – 1A38
1/22/2009 23:05	(9.1:8.6)	(9.1:8.6)	QLogic Corporation	QLogic Fibre Channel Stor Miniport Driver
5/19/2009 2:18	(2.1:3.20)	(2.1:3.20)	QLogic Corporation	QLogic iSCSI Storport Miniport Driver
9/24/2008 19:28	(5.1:1039.2600)	(5.1:1039.2600)	Silicon Integrated Systems Corp.	SiS RAID Stor Miniport Driver
2/17/2009 23:03	(5.0:1.1)	(5.0:1.1)	Promise Technology	Promise SuperTrak EX Series Driver for Windows

__________________________________________________________________________________

System Information: ABCFILE08

OS Name Microsoft Windows Server 2008 R2 Enterprise

Version 6.1.7601 Service Pack 1 Build 7601

Other OS Description Not Available

OS Manufacturer Microsoft Corporation

System Name ABCFILE08

System Manufacturer HP

System Model ProLiant DL360p Gen8

System Type x64-based PC

Processor Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz, 2500 Mhz, 6 Core(s), 12 Logical Processor(s)

BIOS Version/Date HP P71, 9/8/2013

System Events:

Getting an event related to the Schannel with Error state: 1203

Date	Time	Type/Level	Computer Name	Event Code	Source	Description
6/1/2016	3:50:52 PM	Error	ABCFILE08.ABC.com	36888	Schannel	The following fatal alert was generated: 10. The internal error state is 1203.
6/1/2016	3:50:52 PM	Error	ABCFILE08.ABC.com	36888	Schannel	The following fatal alert was generated: 10. The internal error state is 1203.
6/1/2016	4:19:05 PM	Error	ABCFILE08.ABC.com	1230	Microsoft-Windows-FailoverClustering	Cluster resource ‘FileServer-(Condor)’ (resource type ”, DLL ‘clusres.dll’) either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.

Checked and found that the Clusres.dll got deadlocked.

Just after the Events 2012 all the resources started to fail which generally points out the issue from the networking End.

As per the Article: https://support.microsoft.com/en-us/kb/2885205

In Words

0000: 00040000 002C0001 00000000 800007DC

0010: 00000000 C0000184 00000000 00000000

0020: 00000000 00000000 0000058F

C0000184 = STATUS_INVALID_DEVICE_STATE , The device is not in a valid state to perform this request.

This basically is an error that the network driver is giving SRV on the send IRPs. It usually indicates a send is issued on a connection which is no longer in a state valid for sending. For example, send when a connection has not reached connected state will return STATUS_INVALID_DEVICE_STATE. If disconnect has been initiated, then the same error would be returned for further sends.

As per the Article: https://blogs.technet.microsoft.com/yongrhee/2015/05/16/event-id-2012-while-transmitting-or-receiving-data-the-server-encountered-a-network-error/

Cause:

=======

1. Antivirus Filter driver interfering with the network stack

2. An outdated or bad network card driver

3. A bad NIC

4. Network Teaming software

5. WAN Optimization devices

6. Mismatched Speed and Duplex settings between the NIC and switch

7. A spotty connection to a switch port

Resolution:

==========

Make sure that the firmware for the network switches/WAN accelerators and routers are up-to-date.
Update the NIC firmware and driver.
Update the NIC teaming software/driver.
Update the Antivirus software or completely uninstall (for relief, and follow-up w/ the AV vendor)
Manually set the speed/duplex
Replace the network cable(s)
Try a different switch port
For the WAN optimizers, to try getting the packets from being modified, try using encapsulating the packets using IPsec.

6/1/2016	4:26:30 PM	Warning	ABCFILE08.ABC.com	2012	srv	While transmitting or receiving data, the server encountered a network error. Occassional errors are expected, but large amounts of these indicate a possible error in your network configuration. The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.
6/1/2016	4:26:30 PM	Critical	ABCFILE08.ABC.com	1146	Microsoft-Windows-FailoverClustering	The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.
6/1/2016	4:26:30 PM	Warning	ABCFILE08.ABC.com	2012	srv	While transmitting or receiving data, the server encountered a network error. Occasional errors are expected, but large amounts of these indicate a possible error in your network configuration. The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.
6/1/2016	4:26:30 PM	Error	ABCFILE08.ABC.com	1069	Microsoft-Windows-FailoverClustering	Cluster resource ‘FileServer-(Condor)’ in clustered service or application ‘Condor’ failed.

Cluster disk started to fail with Ntfs Errors.

6/1/2016	4:26:37 PM	Error	ABCFILE08.ABC.com	137	Ntfs	The default transaction resource manager on volume T: encountered a non-retryable error and could not start. The data contains the error code.
6/1/2016	4:26:59 PM	Error	ABCFILE08.ABC.com	1069	Microsoft-Windows-FailoverClustering	Cluster resource ‘Disk N:\’ in clustered service or application ‘Condor’ failed.

After we restarted the machine we are getting events related to the Team.

6/1/2016	4:32:53 PM	Warning	ABCFILE08.ABC.com	461	CPQTeamMP	Team ID: 0 Aggregation ID: 0 Team Member ID: 0 PROBLEM: 802.3ad link aggregation (LACP) has failed. ACTION: Ensure all ports are connected to LACP-aware devices.
6/1/2016	4:33:02 PM	Warning	ABCFILE08.ABC.com	434	CPQTeamMP	HP Network Team #1: PROBLEM: A non-Primary Network Link is not receiving. Receive-path validation has been enabled for this Team by selecting the Enable receive-path validation Heartbeat Setting. ACTION: Please check your cabling to the link partner. Check the switch port status, including verifying that the switch port is not configured as a Switch-assist Channel. Generate Broadcast traffic on the network to test whether these are being received. Also make sure all teamed NICs are on the same broadcast domain. Run diagnostics to test card. Drop the NIC from the team, determine whether it is receiving broadcast traffic in that configuration.
6/1/2016	4:35:03 PM	Error	ABCFILE08.ABC.com	103	MSiSCSI	Timeout waiting for iSCSI persistently bound volumes. If there are any services or applications that use information stored on these volumes then they may not start or may report errors.

Application Events:

Checked the application logs and found that the issue is with the connections between the Server and the SAN.

Date	Time	Type/Level	Computer Name	Event Code	Source	Description
6/1/2016	4:35:10 PM	Error	ABCFILE08.ABC.com	2004	Microsoft-Windows-PerfNet	Unable to open the Server service performance object. The first four bytes (DWORD) of the Data section contains the status code.
6/1/2016	4:35:32 PM	Warning	ABCFILE08.ABC.com	281	SnapDrive	Failed to get data for an iSCSI HBA. HBA WMI class instance name: Root\ISCSIPRT\0000_0 Error code = 0x8004100c Error description = WDM specific return code: 4200
6/1/2016	4:35:37 PM	Warning	ABCFILE08.ABC.com	317	SnapDrive	Failed to enumerate LUN. Device path: ‘\\?\mpio#disk&ven_netapp&prod_lun&rev_811a#1&7f6ac24&0&3630413938303033373asdas32232135413330373835363730#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}‘ Storage path: ‘/vol/vol_ISCSI_EKNCL04_QUORUM/qtree_ISCSI_EKNCL04_QUORUM/lun_ISCSI_EKNCL04_QUORUM’ SCSI address: (3,0,0,0) Error code: 0xc00402fa Error description: A LUN with device path \\?\mpio#disk&ven_netapp&prod_lun&rev_811a#1&7f6ac24&0&36304139383030333735343333344637313544333335413330373835363730#{53f56307-b6bf-11sa21312130a0c91efb8b} and SCSI address (3, 0, 0, 0) is exposed through an unsupported initiator.

Cluster Events:

6/1/2016 4:35:30 PM Information ABCFILE08.ABC.com 1062 Microsoft-Windows-FailoverClustering This node has successfully joined the failover cluster ‘EKNCL04’.

Cluster Logs:

00000c88.00002938::2016/06/01-15:06:02.872 ERR mscs::TopologyPersister::TryGetNetworkPrivateProperties: ERROR_FILE_NOT_FOUND(2)’ because of ‘OpenSubKey failed.’

00000c88.00002938::2016/06/01-15:06:02.872 INFO [NM] Received request from client address ABCFILE08.

000015c4.000055dc::2016/06/01-15:06:04.447 WARN [RES] File Server <FileServer-(Condor)>: Failed in NetShareGetInfo(Condor, PST Exports from old server), status 2310. Tolerating…

000015c4.000055dc::2016/06/01-15:06:04.463 WARN [RES] File Server <FileServer-(Condor)>: Failed in NetShareGetInfo(Condor, sp4$), status 2310. Tolerating…

000015c4.000015d4::2016/06/01-15:19:05.014 ERR [RHS] RhsCall::DeadlockMonitor: Call ISALIVE timed out for resource ‘FileServer-(Condor)’.

000015c4.000015d4::2016/06/01-15:19:05.014 INFO [RHS] Enabling RHS termination watchdog with timeout 1200000 and recovery action 3.

000015c4.000015d4::2016/06/01-15:19:05.014 ERR [RHS] Resource FileServer-(Condor) handling deadlock. Cleaning current operation and terminating RHS process.

000015c4.000015d4::2016/06/01-15:19:05.014 ERR [RHS] About to send WER report.

00000c88.0000369c::2016/06/01-15:19:05.014 WARN [RCM] HandleMonitorReply: FAILURENOTIFICATION for ‘FileServer-(Condor)’, gen(0) result 4.

00000c88.0000369c::2016/06/01-15:19:05.014 INFO [RCM] rcm::RcmResource::HandleMonitorReply: Resource ‘FileServer-(Condor)’ consecutive failure count 1.

00000c88.00007224::2016/06/01-15:25:29.369 ERR [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_CALL_TIMED_OUT(5910)’ because of ‘Control(STORAGE_GET_DISK_INFO) to resource ‘Disk L:\’ timed out.’

00000c88.00007224::2016/06/01-15:25:29.369 WARN [RCM] ResourceControl(STORAGE_GET_DISK_INFO) to Disk L:\ returned 5910.

000015c4.00007b6c::2016/06/01-15:26:29.960 WARN [RES] File Server <FileServer-(Condor)>: Failed in NetShareGetInfo(Condor, sp4$), status 2310. Tolerating…

00000c88.00006594::2016/06/01-15:26:30.927 INFO [RCM] rcm::RcmResource::ReattachToMonitorProcess: (IP Address 193.27.213.16, Offline)

List of outdated drivers:

Time/Date String	Product Version	File Version	Company Name	File Description
12/5/2008 23:54	(6.1:3790.0)	(1.6:6.4)	Adaptec, Inc.	Adaptec Windows SAS/SATA Storport Driver
5/1/2007 18:30	(6.0:3790.16512)	(1.6:6.1)	Adaptec, Inc.	Adaptec Windows SATA Storport Driver
2/28/2007 0:04	(6.0:6001.16459)	(7.2:0.0)	Adaptec, Inc.	Adaptec StorPort Ultra320 SCSI Driver (X64)
3/19/2010 16:18	(1.1:2.5)	(1.1:2.5)	Advanced Micro Devices	Storage Filter Driver
2/13/2009 22:18	(4.8:2.0)	(4.8:2.0)	Broadcom Corporation	Broadcom NetXtreme II GigE VBD
2/3/2009 22:52	(7.2:10.211)	(7.2:10.211)	Emulex	Storport Miniport Driver for LightPulse HBAs
12/31/2008 16:29	(4.8:13.0)	(4.8:13.0)	Broadcom Corporation	Broadcom NetXtreme II 10 GigE VBD
4/24/2003 19:03	(6.0:1.0)	(6.0:1.0)	Broadcom Corporation	Frame Access Driver
6/11/2010 1:46	(8.6:2.1014)	(8.6:2.1014)	Intel Corporation	Intel Matrix Storage Manager driver – x64
12/13/2005 21:47	(0.4:22.0)	(5.4:22.0)	Intel Corp./ICP vortex GmbH	Intel/ICP Raid Storport Driver
12/2/2009 21:36	(5.2:3790.1830)	(1.3:0.4)	Intel Corporation	Intel(R) Network Adapter Diagnostic Driver
5/19/2009 2:09	(4.5:1.64)	(4.5:1.64)	LSI Corporation	MEGASAS RAID Controller Driver for Windows 7\Server 2008 R2 for x64
5/19/2009 2:25	13.05.0409.2009	(13.5:409.2009)	LSI Corporation, Inc.	LSI MegaRAID Software RAID Driver
6/6/2006 22:11	(7.10:0.0)	(7.10:0.0)	IBM Corporation	IBM ServeRAID Controller Driver
8/10/2007 0:47	(1.2:78.3)	(1.2:78.3)	Intel Corporation	Intel(R) 5000 Series Chipsets Integrated Device – 1A38
1/22/2009 23:05	(9.1:8.6)	(9.1:8.6)	QLogic Corporation	QLogic Fibre Channel Stor Miniport Driver
5/19/2009 2:18	(2.1:3.20)	(2.1:3.20)	QLogic Corporation	QLogic iSCSI Storport Miniport Driver
9/24/2008 19:28	(5.1:1039.2600)	(5.1:1039.2600)	Silicon Integrated Systems Corp.	SiS RAID Stor Miniport Driver
2/17/2009 23:03	(5.0:1.1)	(5.0:1.1)	Promise Technology	Promise SuperTrak EX Series Driver for Windows

_________________________________________________________________

Conclusion:

After analyzing the logs we can see that the issue started from the Networking End which went offline on Node ABCFILE08 due to which we got the event ID 1135 and the Node got evicted from the Cluster. At 4:35 the Node is added back when the Network after we restarted the Machine. As per the Events we are getting Event ID 2012 which usually indicates a send is issued on a connection which is no longer in a state valid for sending.

Make sure that the firmware for the network switches/WAN accelerators and routers are up-to-date.
Update the NIC firmware and driver.
Update the NIC teaming software/driver.
Update the Antivirus software or completely uninstall (for relief, and follow-up w/ the AV vendor)
Manually set the speed/duplex
Replace the network cable(s)
Try a different switch port
For the WAN optimizers, to try getting the packets from being modified, try using encapsulating the packets using IPsec.

Install following hotfixes on all cluster nodes one by one. Reboot will be required for the changes to take effect. Follow the article and make sure all these updates are installed on all the nodes:

Updates for Cluster Binaries for 2008 R2 : https://support.microsoft.com/en-us/kb/2545685

Investigate the Network timeout / latency / packet drops with the help of in house networking team.

Please Note : This step is the most critical while dealing with network connectivity issues.

Investigation of Network Issues :

We need to investigate the Network Connectivity Issues with the help of in-house networking team.

In order to avoid this issue in future the most critical part is to diagnose & investigate the consistent Network Connectivity Issue with Cluster Networks.

We need to check the network adapter, cables, and network configuration for the networks that connect the nodes.

We also need to check hubs, switches, or bridges in the networks that connect the nodes.

We need to check for Switch Delays & Proxy ARPs with the help of in-house Networking Team.