RCA – 23 – Unable to bring Resource Online on a Cluster

Issue Description:

 

Unable to bring Resource Name:SQLPRDCL01 on Cluster Name:XyzAbcSQLCL01 Running a copy of Microsoft Windows Server 2016 Datacenter Version 10.0.14393 Build 14393

___________________________________________________

 

System Information: ABCDFRSEDBVM02

 

OS Name        Microsoft Windows Server 2016 Datacenter

Version        10.0.14393 Build 14393

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ABCDFRSEDBVM02

System Manufacturer        Microsoft Corporation

System Model        Virtual Machine

System Type        x64-based PC

System SKU        Unsupported

Processor        Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2295 Mhz, 2 Core(s), 4 Logical Processor(s)

BIOS Version/Date        American Megatrends Inc. 090006, 1/6/2017

SMBIOS Version        2.3

 

System Events:

 

  • Reviewed the logs and found that the Cluster Resource Name: SQLPRDCL1 Failed with the Error 1069 however there was no reason given.

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

10/19/2017

12:48:53 AM

Error

ABCDFRSEDBVM02.abc.com

1069

Microsoft-Windows-FailoverClustering

Cluster resource ‘SQL Network Name (SQLPRDCL1)’ of type ‘Network Name’ in clustered role ‘SQL Server (SCCM)’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

10/19/2017

12:48:53 AM

Error

ABCDFRSEDBVM02.abc.com

1205

Microsoft-Windows-FailoverClustering

The Cluster service failed to bring clustered role ‘SQL Server (SCCM)’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

 

Application Events:

 

 

  • Getting the same event for SQL as well, however with no specific reason.
 
 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

10/19/2017

12:06:13 AM

Warning

ABCDFRSEDBVM02.abc.com

60582

Microsoft SQL Server Management

[Warning] ServerConfigurationsManagementConnectionStatusErrored:
System.Exception: Unable to connect to SQL Server —>
System.Data.SqlClient.SqlException: A network-related or instance-specific
error occurred while establishing a connection to SQL Server. The server was
not found or was not accessible. Verify that the instance name is correct and
that SQL Server is configured to allow remote connections. (provider: Named
Pipes Provider, error: 40 – Could not open a connection to SQL Server)
—> System.ComponentModel.Win32Exception: The system cannot find the file
specified

10/19/2017

12:37:53 AM

Warning

ABCDFRSEDBVM02.abc.com

60644

Microsoft SQL Server Management

[Warning] ServerConfigurationsPluginCopyFileWorkerFailure:
System.Exception: Error in copy file worker thread —> System.Exception:
Could not get ‘MSSQLSERVER’ keyValueName at
‘hklm\SOFTWARE\Microsoft\Microsoft SQL Server\Instance Names\SQL’    at

Cluster Logs:

 

218250 00000ae8.0000178c::2017/10/14-13:30:07.805 WARN  [API] s_ApiOpenResourceEx: Resource 38e03198-7f7a-4daa-84fe-baacfbbb329f not found, status = 5007

218305 000012e4.00000964::2017/10/14-13:30:46.040 WARN  [RHS] Error 50 from resource type control for restype Storage Replica.

218863 00000ae8.000016b4::2017/10/14-13:40:48.073 ERR   [API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.
218911 00000ae8.0000097c::2017/10/14-13:41:48.073 ERR   [API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.
218956 000012e4.00000964::2017/10/14-13:42:46.083 WARN  [RHS] Error 50 from resource type control for restype Storage Replica.
218959 000012e4.00000964::2017/10/14-13:42:46.086 WARN  [RHS] Error 50 from resource type control for restype Storage Replica.
218962 00000ae8.0000097c::2017/10/14-13:42:48.073 ERR   [API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.
218982 000016ec.0000150c::2017/10/14-13:43:10.247 ERR   [RHS – Timeout] Resource ‘SQL Network Name (SQLPRDCL01)’ has not responded to the call ONLINERESOURCE:0. The timeout to respond has been exceeded by 15 milliseconds, taking recovery actions.
218983 000016ec.0000150c::2017/10/14-13:43:10.247 ERR   [RHS-Timeout] Health Monitoring Failure: Resource SQL Network Name (SQLPRDCL01) is not functioning as expected. Cancelling current operation.
218984 00000ae8.00001294::2017/10/14-13:43:10.247 WARN  [RCM] HandleMonitorReply: FAILURENOTIFICATION for ‘SQL Network Name (SQLPRDCL01)’, gen(40) result 5018/0.
218989 00000ae8.00001294::2017/10/14-13:43:10.247 ERR   [RCM] rcm::RcmResource::HandleFailure: (SQL Network Name (SQLPRDCL01))
219005 000016ec.0000150c::2017/10/14-13:43:10.277 WARN  [RHS] Failed to open process handle to process with name lsass.exe with error 87
219006 000016ec.0000150c::2017/10/14-13:43:10.277 WARN  [RHS] Failed to open process handle to process with name lsass.exe with error 5

 

603420 00000a9c.00002374::2017/10/19-04:26:18.546 ERR   [API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.
603459 00000a9c.00002374::2017/10/19-04:27:18.550 ERR   [API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.
603504 00000a9c.000000a8::2017/10/19-04:28:18.548 ERR   [API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.
603526 000004c4.00001c6c::2017/10/19-04:28:52.180 WARN  [RES] Network Name <SQL Network Name (SQLPRDCL1)>: AccountAD: Operation got canceled, state: Initializing/Reading

603535 000004c4.00000e14::2017/10/19-04:28:52.186 ERR   [RES] Network Name <SQL Network Name (SQLPRDCL1)>: Online thread Failed: (0)’ because of ‘Initializing netname configuration for SQL Network Name (SQLPRDCL1) failed with error 995.’
603537 000004c4.00000e14::2017/10/19-04:28:52.186 WARN  [RHS] returning ResourceExitStateTerminate.
603573 00000a9c.000000a8::2017/10/19-04:28:52.190 WARN  [RCM] Queueing immediate delay restart of resource SQL Network Name (SQLPRDCL1) in 500 ms.
603634 0000125c.000012e8::2017/10/19-04:28:52.918 WARN  [RHS] Error 50 from resource type control for restype Storage Replica.
603664 00000a9c.00002374::2017/10/19-04:29:18.548 ERR   [API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.
603703 00000a9c.0000029c::2017/10/19-04:30:18.548 ERR   [API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.

 

__________________________________________________________________

 

 

 

System Information: ABCDFRSEDBVM03

 

OS Name        Microsoft Windows Server 2016 Datacenter

Version        10.0.14393 Build 14393

Other OS Description         Not Available

OS Manufacturer        Microsoft Corporation

System Name        ABCDFRSEDBVM03

System Manufacturer        Microsoft Corporation

System Model        Virtual Machine

System Type        x64-based PC

System SKU        Unsupported

Processor        Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2295 Mhz, 2 Core(s), 4 Logical Processor(s)

BIOS Version/Date        American Megatrends Inc. 090006, 1/6/2017

SMBIOS Version        2.3

 

System Events:

 

 

Date

Time

Type/Level

Computer Name

Event Code

Source

Description

10/19/2017

12:42:55 AM

Error

ABCDFRSEDBVM03.abc.com

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer XyzAbcSQLCL01.abc.com using any of the configured protocols; requested by PID     1e88 (C:\Windows\system32\ServerManager.exe).

10/19/2017

12:42:55 AM

Error

ABCDFRSEDBVM03.abc.com

10028

Microsoft-Windows-DistributedCOM

DCOM was unable to communicate with the computer XyzAbcSQLCL01.abc.com using any of the configured protocols; requested by PID     1e88 (C:\Windows\system32\ServerManager.exe).

 

 

 

 

 

 

____________________________________________________________________

 

 

Conclusion:

 

  • Based on the Events in the cluster logs we can see that the cluster is not properly able to communicate with the AD object and the referenced VCO.

 

  • The CNO XyzAbcSQLCL01$ does not have Create Computer objects and Read All Properties permissions on the OU where the CNO located so that it can create the VCO XyzAbcSQLCL01$

 

Please work with your domain administrator to ensure that:

 

  • The cluster identity ‘XyzAbcSQLCL01$’ has Create Computer Objects permissions. By default all computer objects are created in the same container as the cluster identity ‘XyzAbcSQLCL01$’.
  • The quota for computer objects has not been reached.
  • If there is an existing computer object, verify the Cluster Identity ‘XyzAbcSQLCL01$’ has ‘Full Control’ permission to that computer object using the Active Directory Users and Computers tool.
  • The SQL resource is not dependent on the Network Name resource. Edited the SQL resource settings and added the network name to the dependencies of the SQL resource and then tried again to bring the SQL resource online. 

Ashutosh Dixit

I am currently working as a Senior Technical Support Engineer with VMware Premier Services for Telco. Before this, I worked as a Technical Lead with Microsoft Enterprise Platform Support for Production and Premier Support. I am an expert in High-Availability, Deployments, and VMware Core technology along with Tanzu and Horizon.

Leave a Reply