Issue Description:
We have a node “HV11SG2” running on Server 2012 R2 Datacenter which is a part of cluster ABCG2. The Node went down as it lost access to the quorum.
Issue happened on : 1/23/2017
2:18:48 PM
_______________________________________________________________________________________________
System Information: AB1XY2
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name AB1XY2
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 2800 Mhz, 10 Core(s), 10 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 2800 Mhz, 10 Core(s), 10 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.6d.0.062220160055, 22/06/2016
System Events:
- Verified the logs and found that the Cluster Shared Volume went to paused State. After which the Cluster Nodes lost the Communication from AB9XY2 and AB5XY2. After which the Cluster node got evicted from the Failover Cluster Manager.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
1/23/2017 |
2:18:32 PM |
Error |
AB1XY2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘CSV-LegB04’ (‘ABCG2-CSV-LegB04’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:18:34 PM |
Error |
AB1XY2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘CSV-LegA-SQL01’ (‘ABCG2-CSV-SQL01’) has entered a paused state because of ‘(c00000c4)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:18:42 PM |
Error |
AB1XY2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ABCG2-CSV-SQLBKP02’ (‘ABCG2-CSV-SQLBKP02’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:18:43 PM |
Information |
AB1XY2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB1XY2’ lost communication with cluster node ‘AB9XY2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:18:47 PM |
Information |
AB1XY2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB1XY2’ lost communication with cluster node ‘AB5XY2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:18:48 PM |
Critical |
AB1XY2.ad.xyz.com |
1177 |
Microsoft-Windows-FailoverClustering |
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:18:48 PM |
Critical |
AB1XY2.ad.xyz.com |
1146 |
Microsoft-Windows-FailoverClustering |
The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue. |
1/23/2017 |
2:18:48 PM |
Critical |
AB1XY2.ad.xyz.com |
1146 |
Microsoft-Windows-FailoverClustering |
The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue. |
1/23/2017 |
2:18:48 PM |
Error |
AB1XY2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ABCG2-CSV-India’ (‘ABCG2-CSV-India’) has entered a paused state because of ‘(c000000e)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
- As per the CSV Error code the issue seems to be the Disconnect from the Storage.
PS C:\Users\adix5025.INDIA\Downloads\ERR> & ‘.\err(vista).exe’ c000020c
# for hex 0xc000020c / decimal -1073741300
STATUS_CONNECTION_DISCONNECTED ntstatus.h
# The transport connection is now disconnected.
# 1 matches found for “c000020c”
PS C:\Users\adix5025.INDIA\Downloads\ERR>
- Since we were not able to connect to the storage the Cluster Node also lost access to the quorum after which the Node got evicted from the Failover.
1/23/2017 |
2:19:06 PM |
Error |
AB1XY2.ad.xyz.com |
5005 |
ENIC |
Cisco VIC Ethernet Interface #2 : Has encountered an internal error and has failed. |
1/23/2017 |
2:19:06 PM |
Error |
AB1XY2.ad.xyz.com |
5005 |
ENIC |
Cisco VIC Ethernet Interface #2 : Has encountered an internal error and has failed. |
Application Events:
- Verified the logs and found nothing specific related to the issue.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
1/23/2017 |
2:22:55 PM |
Information |
AB1XY2.ad.xyz.com |
8224 |
VSS |
The VSS service is shutting down due to idle timeout. |
1/23/2017 |
2:23:20 PM |
Information |
AB1XY2.ad.xyz.com |
5605 |
Microsoft-Windows-WMI |
The root\mscluster namespace is marked with the RequiresEncryption flag. Access to this namespace might be denied if the script or application does not have the appropriate authentication level. Change the authentication level to Pkt_Privacy and run the script or application again. |
1/23/2017 |
2:23:54 PM |
Information |
AB1XY2.ad.xyz.com |
5605 |
Microsoft-Windows-WMI |
The root\mscluster namespace is marked with the RequiresEncryption flag. Access to this namespace might be denied if the script or application does not have the appropriate authentication level. Change the authentication level to Pkt_Privacy and run the script or application again. |
1/23/2017 |
2:24:35 PM |
Warning |
AB1XY2.ad.xyz.com |
5612 |
Microsoft-Windows-WMI |
Windows Management Instrumentation has stopped WMIPRVSE.EXE because a quota reached a warning value. Quota: ThreadCount Value: 5143 Maximum value: 256 WMIPRVSE PID: 28644 Providers hosted in this process: %SystemRoot%\System32\wbem\cluswmi.dll, %SystemRoot%\System32\wbem\cluswmi.dll, %windir%\system32\wbem\servercompprov.dll, %SystemRoot%\System32\smbwmiv2.dll, %SystemRoot%\System32\wbem\cluswmi.dll, %SystemRoot%\System32\wbem\cluswmi.dll, %SystemRoot%\System32\wbem\cluswmi.dll, C:\Windows\System32\iscsiwmi.dll, %systemroot%\system32\wbem\cimwin32.dll, %SystemRoot%\System32\wbem\cluswmi.dll, %SystemRoot%\system32\tscfgwmi.dll |
Cluster Events:
- Verified the cluster logs and found the same of events which points out the issue with the Network Adaptors.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
1/23/2017 |
2:18:24 PM |
Information |
AB1XY2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.211:~3343~ connected to remote endpoint 192.00.23.223:~3343~. |
1/23/2017 |
2:18:24 PM |
Information |
AB1XY2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.211:~3343~ connected to remote endpoint 192.00.23.212:~3343~. |
1/23/2017 |
2:18:24 PM |
Information |
AB1XY2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.211:~3343~ connected to remote endpoint 192.00.23.219:~3343~. |
1/23/2017 |
2:18:24 PM |
Information |
AB1XY2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 172.00.123.211:~3343~ connected to remote endpoint 172.00.123.215:~3343~. |
1/23/2017 |
2:18:24 PM |
Information |
AB1XY2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.211:~3343~ connected to remote endpoint 192.00.23.217:~3343~. |
1/23/2017 |
2:18:32 PM |
Information |
AB1XY2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has lost the UDP connection from local endpoint 192.00.23.211:~3343~ connected to remote endpoint 192.00.23.222:~3343~. |
1/23/2017 |
2:18:32 PM |
Information |
AB1XY2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has lost the UDP connection from local endpoint 172.00.123.211:~3343~ connected to remote endpoint 172.00.123.222:~3343~. |
List of outdated drivers:
Time/Date String |
Product Version |
File Version |
Company Name |
File Description |
5/14/2013 6:19 |
(6.3:9391.6) |
(4.4:13.0) |
Chelsio Communications |
Virtual Bus Driver for Chelsio ® T4 Chipset |
2/3/2016 21:49 |
(3.5:0.13) |
(3.5:0.13) |
Cisco Systems, Inc. |
Cisco VIC Ethernet Driver |
_______________________________________________________________________________
System Information: AB2SG2
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name AB2SG2
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 2800 Mhz, 10 Core(s), 10 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 2800 Mhz, 10 Core(s), 10 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.6d.0.062220160055, 22/06/2016
System Events:
- Verified the logs and found the same set of events which were being generated on the other Nodes.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
1/23/2017 |
2:18:38 PM |
Error |
AB2SG2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘CSV-LegA03’ (‘ABCG2-CSV-LegA03’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:18:38 PM |
Error |
AB2SG2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ABCG2-CSV-SQLBKP02’ (‘ABCG2-CSV-SQLBKP02’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:18:40 PM |
Error |
AB2SG2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ABCG2-CSV-SQLBKP01’ (‘ABCG2-CSV-SQLBKP01’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:18:41 PM |
Information |
AB2SG2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB2SG2’ lost communication with cluster node ‘HV10SG2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:18:43 PM |
Information |
AB2SG2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB2SG2’ lost communication with cluster node ‘AB9XY2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:18:45 PM |
Information |
AB2SG2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB2SG2’ lost communication with cluster node ‘HV6SG2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:19:11 PM |
Critical |
AB2SG2.ad.xyz.com |
1135 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB1XY2’ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:19:11 PM |
Critical |
AB2SG2.ad.xyz.com |
1135 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘HV12SG2’ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
- As per the Event 1592 we can say that the Communication was restored around: 3:34 PM.
1/23/2017 |
3:34:59 PM |
Information |
AB2SG2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB2SG2’ lost communication with cluster node ‘AB9XY2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
3:34:59 PM |
Information |
AB2SG2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB2SG2’ lost communication with cluster node ‘HV11SG2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
3:35:05 PM |
Information |
AB2SG2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB2SG2’ lost communication with cluster node ‘HV10SG2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
3:35:07 PM |
Information |
AB2SG2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB2SG2’ lost communication with cluster node ‘HV8SG2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
Cluster Events:
- Verified the cluster logs and found the same of events which points out the issue with the Network Adaptors.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
1/23/2017 |
2:18:24 PM |
Information |
AB2SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.212:~3343~ connected to remote endpoint 192.00.23.211:~3343~. |
1/23/2017 |
2:18:25 PM |
Information |
AB2SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 172.00.123.212:~3343~ connected to remote endpoint 172.00.123.217:~3343~. |
1/23/2017 |
2:18:25 PM |
Information |
AB2SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 172.00.123.212:~3343~ connected to remote endpoint 172.00.123.219:~3343~. |
1/23/2017 |
2:18:25 PM |
Information |
AB2SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.212:~3343~ connected to remote endpoint 192.00.23.220:~3343~. |
1/23/2017 |
2:18:25 PM |
Information |
AB2SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.212:~3343~ connected to remote endpoint 192.00.23.219:~3343~. |
List of outdated drivers:
Time/Date String |
Product Version |
File Version |
Company Name |
File Description |
5/14/2013 6:19 |
(6.3:9391.6) |
(4.4:13.0) |
Chelsio Communications |
Virtual Bus Driver for Chelsio ® T4 Chipset |
2/3/2016 21:49 |
(3.5:0.13) |
(3.5:0.13) |
Cisco Systems, Inc. |
Cisco VIC Ethernet Driver |
________________________________________________________________________________
System Information: AB3SG2
OS Name Microsoft Windows Server 2012 R2 Datacenter
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name AB3SG2
System Manufacturer Cisco Systems Inc
System Model UCSB-B200-M3
System Type x64-based PC
System SKU
Processor Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 2800 Mhz, 10 Core(s), 10 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 2800 Mhz, 10 Core(s), 10 Logical Processor(s)
BIOS Version/Date Cisco Systems, Inc. B200M3.2.2.6d.0.062220160055, 22/06/2016
System Events:
- Verified the logs and found the same set of events which were being generated on the other Nodes.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
1/23/2017 |
2:18:39 PM |
Error |
AB3SG2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ABCG2-CSV-FILE04’ (‘ABCG2-CSV-FILE04’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:18:40 PM |
Information |
AB3SG2.ad.xyz.com |
1592 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB3SG2’ lost communication with cluster node ‘HV6SG2’. Network communication was reestablished. This could be due to communication temporarily being blocked by a firewall or connection security policy update. If the problem persists and network communication are not reestablished, the cluster service on one or more nodes will stop. If that happens, run the Validate a Configuration wizard to check your network configuration. Additionally, check for hardware or software errors related to the network adapters on this node, and check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:18:41 PM |
Error |
AB3SG2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ABCG2-CSV-FILE03’ (‘ABCG2-CSV-FILE03’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:18:41 PM |
Error |
AB3SG2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘CSV-LegA-SQL01’ (‘ABCG2-CSV-SQL01’) has entered a paused state because of ‘(c000000e)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:19:05 PM |
Error |
AB3SG2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ABCG2-CSV-BK1’ (‘ABCG2-CSV-BK1’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:19:05 PM |
Error |
AB3SG2.ad.xyz.com |
5120 |
Microsoft-Windows-FailoverClustering |
Cluster Shared Volume ‘ABCG2-CSV-India’ (‘ABCG2-CSV-India’) has entered a paused state because of ‘(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished. |
1/23/2017 |
2:19:11 PM |
Critical |
AB3SG2.ad.xyz.com |
1135 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘AB1XY2’ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:19:11 PM |
Critical |
AB3SG2.ad.xyz.com |
1135 |
Microsoft-Windows-FailoverClustering |
Cluster node ‘HV12SG2’ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges. |
1/23/2017 |
2:19:14 PM |
Error |
AB3SG2.ad.xyz.com |
4 |
Microsoft-Windows-Security-Kerberos |
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server hv4sg2$. The target name used was MSServerClusterMgmtAPI/ABCG2.ad.xyz.com. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server. This error can also happen if the target service account password is different than what is configured on the Kerberos Key Distribution Center for that target service. Ensure that the service on the server and the KDC are both configured to use the same password. If the server name is not fully qualified, and the target domain (AD.xyz.com) is different from the client domain (AD.xyz.com), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server. |
Cluster Events:
- Verified the cluster logs and found the same of events which points out the issue with the Network Adaptors.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
1/23/2017 |
2:18:24 PM |
Information |
AB3SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.213:~3343~ connected to remote endpoint 192.00.23.222:~3343~. |
1/23/2017 |
2:18:24 PM |
Information |
AB3SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 172.00.123.213:~3343~ connected to remote endpoint 172.00.123.221:~3343~. |
1/23/2017 |
2:18:25 PM |
Information |
AB3SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 172.00.123.213:~3343~ connected to remote endpoint 172.00.123.220:~3343~. |
1/23/2017 |
2:18:27 PM |
Information |
AB3SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.213:~3343~ connected to remote endpoint 192.00.23.219:~3343~. |
1/23/2017 |
2:18:27 PM |
Information |
AB3SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 172.00.123.213:~3343~ connected to remote endpoint 172.00.123.211:~3343~. |
1/23/2017 |
2:18:28 PM |
Information |
AB3SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 172.00.123.213:~3343~ connected to remote endpoint 172.00.123.222:~3343~. |
1/23/2017 |
2:18:29 PM |
Information |
AB3SG2.ad.xyz.com |
1650 |
Microsoft-Windows-FailoverClustering |
Cluster has missed two consecutive heartbeats for the local endpoint 192.00.23.213:~3343~ connected to remote endpoint 192.00.23.222:~3343~. |
________________________________________________________________________________
Conclusion:
- After analyzing the logs we found that the issue started after the Node:AB1XY2 lost the Communication with the other nodes of the Cluster, which can be clearly verified by Event ID 1650. After which the Storage also got disconnected and the CSV went to paused state.
- Based on the Error that we are getting the issue seems to be due to the Cisco VIC Ethernet Interface went to failed state. I will recommend you to involve the CISCO Team to troubleshoot further on this issue.
- The following file system locations should be excluded from virus scanning on a server that is running Cluster Services:
- The path of the \mscs folder on the quorum hard disk. For example, exclude the Q:\mscs folder from virus scanning.(Applicable for Cluster 2003)
- The %Systemroot%\Cluster folder.(Applicable for Cluster 2003, 2008 & 2008 R2)
- The temp folder for the Cluster Service account. For example, exclude the \clusterserviceaccount\Local Settings\Temp folder from virus scanning.(Applicable for Cluster 2003)
- Please follow the article to add the antivirus exclusion for http://support.microsoft.com/kb/309422 .
- Install following hotfixes on all cluster nodes one by one. Reboot will be required for the changes to take effect. Follow the article and make sure all these updates are installed on all the nodes:
- Investigate the Network timeout / latency / packet drops with the help of in house networking team.
Please Note : This step is the most critical while dealing with network connectivity issues.
Investigation of Network Issues :
We need to investigate the Network Connectivity Issues with the help of in-house networking team.
In order to avoid this issue in future the most critical part is to diagnose & investigate the consistent Network Connectivity Issue with Cluster Networks.
We need to check the network adapter, cables, and network configuration for the networks that connect the nodes.
We also need to check hubs, switches, or bridges in the networks that connect the nodes.
We need to check for Switch Delays & Proxy ARPs with the help of in-house Networking Team.