System Information: 2ABCRV038
OS Name Microsoft Windows Server 2012 R2 Standard
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name 2ABCRV038
System Manufacturer Dell Inc.
System Model PowerEdge R730xd
System Type x64-based PC
System SKU SKU=NotProvided;ModelName=PowerEdge R730xd
Processor Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, 2200 Mhz, 12 Core(s), 24 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, 2200 Mhz, 12 Core(s), 24 Logical Processor(s)
BIOS Version/Date Dell Inc. 2.3.4, 08/11/2016
System Events:
- Issue started with the File share witness failed the health check. This is something which is done by the cluster service so check If the cluster is able to communicate with the File share.
- After which the cluster witness wait to failed state.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
3/20/2017 |
12:27:07 AM |
Warning |
2ABCRV038.abc.uk |
1562 |
Microsoft-Windows-FailoverClustering |
File share witness resource ‘File Share Witness’ failed a periodic health check on file share ‘\\2ABCRV094\SQLWitness‘. Please ensure that file share ‘\\2ABCRV094\SQLWitness‘ exists and is accessible by the cluster. |
3/20/2017 |
12:27:07 AM |
Error |
2ABCRV038.abc.uk |
1069 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘File Share Witness’ of type ‘File Share Witness’ in clustered role ‘Cluster Group’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. |
3/20/2017 |
12:27:28 AM |
Critical |
2ABCRV038.abc.uk |
1564 |
Microsoft-Windows-FailoverClustering |
File share witness resource ‘File Share Witness’ failed to arbitrate for the file share ‘\\2ABCRV094\SQLWitness‘. Please ensure that file share ‘\\2ABCRV094\SQLWitness‘ exists and is accessible by the cluster. |
3/20/2017 |
12:27:28 AM |
Error |
2ABCRV038.abc.uk |
1205 |
Microsoft-Windows-FailoverClustering |
The Cluster service failed to bring clustered role ‘Cluster Group’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role. |
- After this we can see an event being occurred on the Cluster Network name went to failed state.
3/20/2017 |
9:29:16 AM |
Error |
2ABCRV038.abc.uk |
1069 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘2ABCSQL_2ABCSQL’ of type ‘Network Name’ in clustered role ‘2ABCSQL’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. |
Application Events:
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
058952 000019e0.0000270c::2017/03/11-00:28:26.694 ERR [RES] File Share Witness <File Share Witness>: Failed to create or open directory \\2ABCRV094\SQLWitness\e8c5e3e9-c9cf-4f79-8367-a0265b3627cf, error 53.
058953 000019e0.0000270c::2017/03/11-00:28:26.694 ERR [RES] File Share Witness <File Share Witness>: Failed to validate an access to the active share \\2ABCRV094\SQLWitness\e8c5e3e9-c9cf-4f79-8367-a0265b3627cf with 53.
058954 000019e0.0000270c::2017/03/11-00:28:26.694 ERR [RES] File Share Witness <File Share Witness>: Failed to create or open directory \\2ABCRV094\SQLWitness\e8c5e3e9-c9cf-4f79-8367-a0265b3627cf, error 53.
058955 000019e0.0000270c::2017/03/11-00:28:26.694 ERR [RES] File Share Witness <File Share Witness>: Failed to validate an access to the active share \\2ABCRV094\SQLWitness\e8c5e3e9-c9cf-4f79-8367-a0265b3627cf with 53.
Error 53
The most common symptom of a problem in NetBIOS name resolution is when the Ping utility returns an Error 53 message. The Error 53 message is generally returned when name resolution fails for a particular computer name. Error 53 can also occur when there is a problem establishing a NetBIOS session. To distinguish between these two cases, use the following procedure:
To determine the cause of an Error 53 message
- From the Start menu, open a command prompt.
- At the command prompt, type:
net view \\< hostname>
where < hostname> is a network resource you know is active.
If this works, your name resolution is probably not the source of the problem. To confirm this, ping the host name, as name resolution can sometimes function properly and yet net use returns an Error 53 (such as when a DNS or WINS server has a bad entry). If Ping also shows that name resolution fails (by returning the “Unknown host” message), check the status of your NetBIOS session.
To check the status of your NetBIOS session
- From the Start menu, open a command prompt.
- At the command prompt, type:
net view \\< IP address>
where < IP address> is the same network resource you used in the above procedure. If this also fails, the problem is in establishing a session.
- This indicates that you could make a connection to that share. However, if you get the message “System error 53 has occurred. The network path was not found,” this indicates a TCP/IP configuration problem with the network card.
System Information: 2ABCRV039
OS Name Microsoft Windows Server 2012 R2 Standard
Version 6.3.9600 Build 9600
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Name 2ABCRV039
System Manufacturer Dell Inc.
System Model PowerEdge R730xd
System Type x64-based PC
System SKU SKU=NotProvided;ModelName=PowerEdge R730xd
Processor Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, 2200 Mhz, 12 Core(s), 24 Logical Processor(s)
Processor Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, 2200 Mhz, 12 Core(s), 24 Logical Processor(s)
BIOS Version/Date Dell Inc. 2.3.4, 08/11/2016
System Events:
- Analyzed the logs and after that the issue seems to be the Network due to which the SQL Server Availability Group went to failed state.
Date |
Time |
Type/Level |
Computer Name |
Event Code |
Source |
Description |
3/20/2017 |
9:25:26 AM |
Error |
2ABCRV039.abc.uk |
1069 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘2ABCSQL’ of type ‘SQL Server Availability Group’ in clustered role ‘2ABCSQL’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. |
3/20/2017 |
9:25:27 AM |
Error |
2ABCRV039.abc.uk |
1069 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘2ABCSQL’ of type ‘SQL Server Availability Group’ in clustered role ‘2ABCSQL’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. |
3/20/2017 |
9:25:27 AM |
Error |
2ABCRV039.abc.uk |
1205 |
Microsoft-Windows-FailoverClustering |
The Cluster service failed to bring clustered role ‘2ABCSQL’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role. |
3/20/2017 |
9:25:27 AM |
Error |
2ABCRV039.abc.uk |
1254 |
Microsoft-Windows-FailoverClustering |
Clustered role ‘2ABCSQL’ has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period. |
3/20/2017 |
9:26:58 AM |
Error |
2ABCRV039.abc.uk |
4 |
Microsoft-Windows-Security-Kerberos |
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server 2ABCRV039$. The target name used was HTTP/2GSQLCLUSTER.abc.uk. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server. This error can also happen if the target service account password is different than what is configured on the Kerberos Key Distribution Center for that target service. Ensure that the service on the server and the KDC are both configured to use the same password. If the server name is not fully qualified, and the target domain (abc.uk) is different from the client domain (abc.uk), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server. |
3/20/2017 |
9:27:31 AM |
Error |
2ABCRV039.abc.uk |
1069 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘2ABCSQL’ of type ‘SQL Server Availability Group’ in clustered role ‘2ABCSQL’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. |
3/20/2017 |
9:27:31 AM |
Error |
2ABCRV039.abc.uk |
1205 |
Microsoft-Windows-FailoverClustering |
The Cluster service failed to bring clustered role ‘2ABCSQL’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role. |
- At this point the cluster is not able to communicate on Network: 10.175.00.154 as the Network is showing in unavailable.
‘.\err(vista).exe’ 5035
# for decimal 5035 / hex 0x13ab
ERROR_NETWORK_NOT_AVAILABLE winerror.h
# A cluster network is not available for this operation.
3/20/2017 |
9:33:30 AM |
Warning |
2ABCRV039.abc.uk |
1045 |
Microsoft-Windows-FailoverClustering |
No matching network interface found for resource ‘2ABCSQL_10.175.00.154’ IP address ‘10.175.00.154’ (return code was ‘5035’). If your cluster nodes span different subnets, this may be normal. |
3/20/2017 |
9:33:30 AM |
Error |
2ABCRV039.abc.uk |
1069 |
Microsoft-Windows-FailoverClustering |
Cluster resource ‘2ABCSQL_10.175.00.154’ of type ‘IP Address’ in clustered role ‘2ABCSQL’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. |
3/20/2017 |
9:33:34 AM |
Warning |
2ABCRV039.abc.uk |
1045 |
Microsoft-Windows-FailoverClustering |
No matching network interface found for resource ‘2ABCSQL_10.175.00.154’ IP address ‘10.175.00.154’ (return code was ‘5035’). If your cluster nodes span different subnets, this may be normal. |
3/20/2017 |
9:41:34 AM |
Warning |
2ABCRV039.abc.uk |
4 |
b57nd60a |
Broadcom NetXtreme Gigabit Ethernet #6: The network link is down. Check to make sure the network cable is properly connected. |
_________________________________________________________________________________________________
Conclusion:
- Based on the logs we can conclude that the issue started with the Cluster Network due to which the Node: 2ABCRV038 was not able to communicate with the External environment due to which the Fileshare failed and then later the entire Node.
Plan:
- Based on the Article: https://technet.microsoft.com/en-us/library/hh289314.aspx I will recommend you to get the Network Adaptor drivers updated as the issue could be with the device driver which caused the cluster node to disconnect from the Environment.
- Please install the Later Hotfixes on both the Cluster Nodes: https://support.microsoft.com/en-us/help/2920151/recommended-hotfixes-and-updates-for-windows-server-2012-r2-based-failover-clusters
- Investigate the Network timeout / latency / packet drops with the help of in house networking team.
Please Note : This step is the most critical while dealing with network connectivity issues.
Investigation of Network Issues :
We need to investigate the Network Connectivity Issues with the help of in-house networking team.
In order to avoid this issue in future the most critical part is to diagnose & investigate the consistent Network Connectivity Issue with Cluster Networks.
We need to check the network adapter, cables, and network configuration for the networks that connect the nodes.
We also need to check hubs, switches, or bridges in the networks that connect the nodes.
We need to check for Switch Delays & Proxy ARPs with the help of in-house Networking Team.